In part 1 of this tutorial, we will use FME to prepare data for Tableau. More specifically, we will perform some data validation, manipulation and cleanup on business license data in CSV format before loading it to Tableau.
This tutorial consists of 2 parts:
1. How to Prepare Data for Tableau with FME (Processing Spreadsheets) (Current Article)
In Part 1, we'll process a single CSV file and perform data validation and cleanup before loading it to Tableau.
In Part 2, we’ll modify the FME workspace from Part 1 to handle multiple CSV files.
We will start by creating a simple translation, look at the results in Tableau, then come back to FME to do the data validation and cleanup.
1. Create new workspace.
Start FME Workbench and select the New option under Create Workspace.
2. Set up CSV Reader.
The first step is to read the CSV file. Drag the business license CSV file from the file explorer onto the blank canvas. Notice, that FME has already filled in the reader format and dataset.
We can tell FME to convert the CSV file latitude and longitude values to points AS it reads the data. To do this, click on the Parameters button.
- First double-check that the File Preview is correct,
- then, under Schema Attributes, scroll down to the Latitude and Longitude columns. Change their Type to y_coordinate and x_coordinate respectively. This tells fme to create a point for each record with a latitude and longitude value.
- finally, the Database Parameter allows us to choose different naming schemes for the layers, or feature types, that end up on the canvas. Make sure it is set to “Feature Type from File Names”.
When the CSV source data is added to the canvas, click on the arrow to see the full attribute list.
3. Confirm points are created as CSV file is read
Right-click on the source feature type, and select Connect Inspector. Go ahead and run the workspace. Data is read and directed to the FME Data Inspector. We can see that we do, in fact, have points.
4. Add Tableau writer
From the Writers menu, select Add Writer. For format, start typing Tableau, and select the Tableau Data Extract format. For Dataset, choose a directory in which to write the .tde file. Click OK to add the Tableau writer to the workspace.
5. Modify writer feature type properties
Open the properties of the writer feature type. We can now specify the name of the table we’d like to write to. Call it BusinessLicenses. Set the geometry to tde_point.
6. Run workspace
You may have noticed a number of blue warnings go by in the log file. These are related to problems with the CSV data, which we will soon fix with FME.
7. Examine output TDE file in Tableau
In Tableau, we can see that all of the columns correspond to the columns we read in from the CSV file with FME. Notice that the data types on the columns have all been set automatically by FME based on it’s best guess at what kind of data is inside each one. We can see that the LicenseRSN has correctly been set to a Number type and BusinessName is a String.
Take a closer look at the business license data.
Here is a list of tasks we will accomplish with FME. We have already completed the first one in our current workspace. We will modify the workspace to perform the rest of the tasks.
1. Filter out records that don’t have latitude/longitude values
You may recall that some of the records in the CSV file did not have latitude and longitude values. Since FME would not have been able to create points for them, we want to filter them out; a GeometryFilter will help us accomplish this.
In the Tranformers Gallery, expand the “All” folder, and find the GeometryFilter. Drag it (or double-click on it) to add it to the canvas and connect it to the CSV source. Open the properties of the GeometryFilter. Select “Point” for the “Geometry Types to Filter”. Click OK.
2. Run workspace with full inspection
Go ahead and run the workspace, making sure that the Run with Full Inspection option is selected.
We can confirm that 731, out of our 10,000 records, did not have latitude longitude values.
3. Write records with no latitude/longitude to "data cleanup" Excel file
Before we continue processing the points, let’s write these records out to a “Data Cleanup” file, so that they may be fixed. We will write them out to Excel.
From the Writers menu, add an Excel writer. Enter the name of a file to write to. Connect the new writer feature type up to the unfiltered port of the GeometryFilter, then open up it’s properties and change the sheet name to “Missing Latitude Longitude”.
Now that we have dealt with the missing values, let’s continue processing the points.
4. Ensure PostalCode has values: AttributeValidator
Place an AttributeValidator on the canvas and connect it to the GeometryFilter. Open it’s properties and select PostalCode for the Attribute to Validate. The validation rule is that PostalCode MUST have a value. Also, validate that some of the string fields () have string values, and that the numeric fields () have numeric values.
5. Write records that fail validation to new sheet in "data cleanup" Excel file
Again, before continuing to process the valid data, write the data that fails validation out to a different sheet in the “Data Cleanup” excel file already set up. Right-click on the canvas and select “Insert Writer Feature Type”. Call the new sheet “Failed Validation”. Connect it to the Failed port of the AttributeValidator.
6. Extract first 3 characters of PostalCode: SubStringExtractor
Now that we know that the “records output” from the AttributeValidator all have a value for Postal Code, let’s extract the first 3 characters. We do this because Tableau uses the first 3 characters of the postal code to automatically map the areas.
Place a SubstringExtractor on the canvas, and configure it to extract the first 3 characters from the PostalCodeAttribute. We will call the resulting attribute PostCodeTrimmed.
7. Set up conditional attribute values for FeePaid: AttributeManager
The AttributeManager is a transformer that allows us to do many attribute manipulations, including setting up conditional values.
- If the FeePaid attribute is empty, set the value to 0, otherwise leave it as is.
- Create a new attribute called BusinessDisplayName. Set it’s value to BusinessTradeName, but only if BusinessTradeName has a value, otherwise set it to BusinessName.
Connect the AttributeManager output to the Writer feature type.
8. Modify writer feature type properties
The attributes we removed just now are still on the output schema, and have turned red. The original attribute schema was a copy of the source schema. It has changed, as a result of our data transformation, but we may update it to reflect what we have done.
Open the properties of the destination feature type, go to the User Attributes tab, and click on Automatic for Attribute Definition. The attribute schema reflects the changes we have made.
Since we want to overwrite the Tableau file we initially wrote, open the properties again and change the Table Handling to Drop and Create.
9. Run workspace and confirm .tde file exists
Let’s run our final workspace! Click on the Run button. Confirm the .tde file and DataCleanup.xlsx files were created using Windows Explorer.
10. Examine .tde file in Tableau
In the “Data Source” view, notice all of the records that we have imported from the CSV file. We can also see that the PostCodeTrimmed attribute, that we created within our workspace, is present along with the other fields imported from the CSV file.
Now that our data is imported into Tableau, we can begin to create Data Views to explore this data.
For example, we can create a simple map view to see the data points overlaid on a map:
- create a new sheet
- double-click on latitude and longitude to add our coordinates
- split those up based on their geometry id to see the individual points
- colour them by status to get an overall view of which businesses currently have active licenses.
We could also make use of the postal code field to show our data:
- create a new view
- add PostalCodeTrimmed, from “Dimensions”, and colour them by the unique count of business licenses in the area.
In the next video, we will modify the workspace we just built to process multiple CSV files.
3 People are following this .