Comma Separated Value, or CSV, is a way to store structured information in an ascii file format, thereby making it a very simple database. This has made it a very popular format for exchanging information between spreadsheets, databases and other software systems.
In FME, CSV is treated as a database format. Each line in the file is a record, with a comma (or other) character between each field within the record.
There is often a header line to provide names for the fields. For example:
company, address, telephone, web Safe Software Inc, 132nd Street Surrey, 604 501 9985, www.safe.com
If you check the parameter 'File Has Field Names' then this header will be used to provide attribute names. Otherwise it will be treated as a record within the file. A 'Lines to Skip' parameter enables a user to specify if the header takes up more than a single record/line.
Despite the name a CSV file does not have to be comma separated.
FME is capable of recognizing a file with any separator or delimiter character, as long as you specify that character in the settings box.
The CSV reader is also capable of handling data consisting of fixed width columns without a separator, provided that the character spacing the data (usually a space character) does not appear in the data itself. Simply use the parameter 'Remove Duplicate Separators' to achieve this. An alternative method is to use the Textfile reader and an AttributeSplitter transformer.
FME can be used to translate CSV data into other non-spatial formats such as Excel, Oracle, SQL Server or XML. Use the AttributeRenamer to change the names of attributes.
CSV files often contain spatial data listed as a series of X/Y coordinates. To turn these non-geometry features (records) into spatial features you must use FME transformers.
To get point features:
Don't forget to set the coordinate system on the CSV reader if you create points or lines!
To get line features is a two-step process:
To get polygon (area) features:
Q) I tried to create line features, but the results were very erratic. Why?
A) Possibly the records in the CSV file were probably not in order. Use a Sorter transformer to sort your features by ID number. Alternatively, make sure you are using a break attribute parameter as described above.
Q) I read in my CSV file but I don’t see any graphics in the Viewer or output. Why?
A) Because FME can’t easily tell which fields contain references to an X/Y coordinate, all records in a CSV file are treated as a non-geometry feature; i.e. text records without graphics.
To turn records into spatial features use the methods described above. To view non-spatial data in the FME Viewer, use the toolbar button 'Select No Geometry'.
Q) My CSV data has a comma within the value for a field – will FME recognize this?
Each field in a CSV file can be contained within quote symbols (") – if this is the case FME will be able to recognize a delimiter character within a field because it will be inside a set of quotes.
If you don’t have quotes then a delimiter within a field will give you a badly formed CSV file that neither FME or any other application will be able to properly read. A good example of a quoted delimiter is (note the comma after '132nd Street'): company, address, telephone, web "Safe Software Inc", "132nd Street, Surrey", "604 501 9985", www.safe.com
Q) My CSV file contains embedded linefeeds - can FME read these files?
A) In some cases CSV files contain linefeeds in the CSV fields, i.e. a,b,"data<lf>more data",d
FME will interpret the <lf> (linefeed) as a new line and the single line of data will be read as two incomplete lines instead. You can preprocess the CSV file and remove the linefeeds using the attached workspace FixCSVPython.fmw
There are several example workspaces for reading CSV data on fmepedia, including:
This workspace (see attached csv feature reader.zip) takes X/Y coordinate pairs listed in a comma delimited (CSV) file and creates features.
The CSV file format is...
Where line is an ID number for the feature which is listed against each coordinate.
If there is only a single coordinate for that feature it becomes a point feature.
If there is more than one coordinate for that feature it becomes a line feature.
If the final coordinate matches the first it becomes a polygon feature.
In this example CSV file there are 3 lines, 1 point and 1 polygon.
This workspace (see attached CSV Reader 2.zip) handles the following data structure:
FEATURE,X,Y,Z,POINT LINE,0,0,11,START LINE,1,1,11,VERTEX LINE,1,2,11,VERTEX LINE,2,4,11,VERTEX LINE,5,3,11,VERTEX LINE,4,4,11,VERTEX LINE,4,6,11,VERTEX LINE,2,5,11,VERTEX LINE,1,4,11,END
...where a start and end marker indicate the first and last records in the file.
The difference with this CSV file is that there is no unique ID number - each line is designated by a point type. START indicates the start point of a line, VERTEX indicates a vertex and END indicates the end point of the line.
The start point doesn't need to be indicated so we change it to VERTEX to be consistent with the vertices.
The end point needs to be different, but also included as a vertex, so a copy is made of it.
One end-point copy is given a pen number of VERTEX and the other retains END.
PointConnector will then function correctly - although the end-of-line markers are output as 2 extra points and should be ignored.
Because of this it won't process point features.
The processing step puts the points out of order, so a Counter transformer is used prior to processing, and a Sorter transformer prior to the PointConnector to get them back into the correct order.
This workspace (see attached CSV Reader 3.zip) takes x/y coord pairs listed in a file and creates lines.
TYP,ST_ID,PNT_ID,JOB_NUM,MEAS_ACC,NORTHING,EASTING,NEXT_POINT,PREV_POINT MM,0002,-4230-75,030932,1,163212.74,258589.08,-4138-33,-4138-32 MM,0002,-4138-33,030932,1,163212.40,258579.29,-4138-34,-4230-75 MM,0002,-4138-34,030932,1,163211.97,258581.98,-4138-35,-4138-33 MM,0002,-4138-35,030932,1,163211.89,258565.12,-4138-36,-4138-34 MM,0002,-4138-36,030932,1,163211.74,258553.82,-4138-37,-4138-35 MM,0002,-4138-37,030932,1,163211.22,258554.15,-4138-38,-4138-36 MM,0002,-4138-38,030932,1,163211.56,258564.37,-4138-39,-4138-37 MM,0002,-4138-39,030932,1,163211.49,258567.05,-4138-8,-4138-38 MM,0002,-4138-32,030932,1,163212.16,258588.87,-4230-75,A MM,0002,-4138-8,030932,1,163211.05,258577.80,Z,-4138-39
...where each record has a pointer to the next one in the sequence.
The difference with this CSV file is that the point ID numbers are not consecutive within the line - each point has an ID number and a pointer to point to the previous and next points in the line.
This sort of task normally needs some form of looping/iteration; which wasn't possible with Workbench at the time this was created. This workspace simulates this iteration through a set of features by creating a copy of each feature for each iteration you need.
We create a list of all the points for a feature using the FeatureMerger and then create a separate feature that contains the full list using a ListElementCounter/Cloner. Then Global Variables are used to define the next point to be read and that feature extracted from the list of the next feature with a ListIndexer. Note the use of the function @GlobalVariable(GV) in the ListSearcher to determine which element in the list to read.
NB: It all works only because FME works by processing a single feature as far as possible in the workspace. I don't know if many people really understand that concept, but it's the entire basis for this workspace. One feature defines a global variable containing the next point ID and the next feature uses that variable to determine which point to use in the list. In the same vein the StatisticsCalculators in there aren't needed for calculating statistics; they're used to force FME to stop at that point and gather all features together (in FME2007 you could use the FeatureHolder).
This workspace (see attached csvreader4.zip) takes a source CSV file (more a text file really) and converts it to a new structure.
The important part is taking individual records in the file and merging them into a single record. Each required feature starts with a record tagged LO, and contains TI records.
LO;9 781 662;01K;01K;RT-08961;N TI;407141;62;CO TI;468736;62;CO TI;485293;62;CO TI;517330;62;CO TI;531405;62;CO TI;563040;62;CO TI;565006;62;CO TI;593095;62;CO TI;597124;62;CO TI;598145;62;CO TI;604331;62;CO TI;607442;62;CO LO;8 945 021;01K;01K;RT-08961;N TI;598145;62;CO TI;598148;62;CO TI;598149;62;CO TI;598144;62;CO TI;598147;62;CO CO;051260;A;776-101;N;NO;051260;A;776-101;N
The output from the above will be two features...
9 781 662;407141;62;CO;468736;62;CO;485293;62;CO;517330;62;CO;531405;62;CO;563040;62;CO;565006;62;CO;593095;62;CO;597124;62;CO;598145;62;CO;604331;62;CO;607442;62;CO 8 945 021;598145;62;CO;598148;62;CO;598149;62;CO;598144;62;CO;598147;62;CO;051260;A;776-101;N;NO;051260;A;776-101;N
The workspace assigns a unique ID to each feature, using a global variable to keep track of the ID as subsequent records pass through. Records with the same ID as grouped using an Aggregator and a ListConcatenator used to build the new record.
No one has followed this yet.