span8
span4
span8
span4
A dataset schema (data model) consists of multiple parts. Some parts relate to attributes, other parts relate to the spatial data.
The spatial part of a schema usually defines the feature types (layers, tables, etc) that exist or are permitted to exist in a dataset, and the geometry types (lines, points, polygons, etc) that exist or are permitted to exist in a dataset.
An invalid schema occurs where a feature exists outside of the permitted feature types (for example, a layer of data has a different name to the dataset specifications) and as a type of geometry other than that is permitted (for example a line feature exists on a layer for polygon features).
This can be important for internal (corporate) consistency and integrity, but also when using formats that are strictly defined by the table names and geometry types permitted.
FME can deal with format limitations automatically, but the user must define whether data meets a corporate data standard or not. Because there are various tests that can be made, there are various transformers in FME that can be used to test them. The following example and notes cover just a few of these.
Source dataset (including Permitted Layer List)
Locating Invalid Feature Types: Workspace as a Template
Counting and Fixing Invalid Feature Types: Workspace as a Template
Locating and Fixing Invalid Geometry Types: Workspace as a Template
The source dataset for this example provides information on construction activity and projects that may affect the flow of traffic in the city of Vancouver. It is stored in GML format:
In theory, all of these features should consist of simple polygon geometries. The layer each item exists on should represent the organization undertaking the construction. Permitted values are:
However, we can't be sure that the correct layers have been used, or the correct geometry, and we shall have to test that.
Follow these steps to learn how to identify source feature types (layers) that exist in a source dataset.
1. Start FME Workbench and begin with an empty canvas. Select Readers > Add Reader from the menubar.
Set the data format to GML (Geography Markup Language) and select the attached GML dataset as the source. Set the Workflow Options to Single Merged Feature Type (to make sure all objects are read as a single layer) and click OK to add the reader:
Save the workspace.
2. Place a DuplicateFilter transformer connected to the reader feature type. Inspect the parameters and set the Key Attribute to fme_feature_type:
fme_feature_type records the layer of the source data, so by filtering out a single example of each we have effectively created a list of feature types (layers) in the source dataset.
3. Select Writers > Add Writer from the menubar. Set the data format to Text File and define a location to write the text file to.
Connect the DuplicateFilter:Unique port to the textfile writer's feature type. Map the attribute fme_feature_type to the writer's text_line_data (either by drawing a connection or using an AttributeManager transformer):
4. Run the translation. Open the output text file. We now have a list of all layers that are used in the source dataset, both valid and invalid:
For instance, "private" is valid, xyz is invalid, and "tellus" is obviously a typo that should be "telus".
We now have a list of feature types, and can see that some are invalid. But to count or filter these we need to know which layers are permitted, and preferably to have these stored in a file somewhere.
There are a number of transformers that could be used to match a feature type to this list - for example, the AttributeFilter - but here we'll use the DatabaseJoiner.
5. Place a DatabaseJoiner transformer into the workspace, connected to a second output from the source dataset:
Inspect the DatabaseJoiner parameters. Set them up as follows:
With this setup and feature that emerges from the Unjoined port has an invalid feature type.
6. Place a StatisticsCalculator connected to the DatabaseJoiner:Unjoined output port. Inspect the parameters and set them up as follows:
7. Connect an Inspector transformer to the StatisticsCalculator:Complete output port and run the workspace. The Data Inspector will show all the features that have an incorrect layer, with the layer recorded on the attribute fme_feature_type, and the number of invalid features recorded in the attribute BadFeatures:
So now we have a count of the features with invalid layers. We can't fix the layer names - because we don't know what they should be - but we have cleaned the dataset by filtering out these invalid features.
Follow these steps to learn how to identify source features that have an incorrect geometry type.
8. Select Writers > Add Writer from the menubar. Set the data format to Esri Shapefile and define a location to write the dataset to. For the Shapefile Definition parameter, select Copy from Reader:
Connect the newly created writer feature type to the DatabaseJoiner:Joined port.
9. Inspect the parameters for the new writer feature type.
For the Shapefile Name click the drop-down arrow and select Attribute Value > fme_feature_type. This will ensure the data is written to the same layer it came from. Now set Geometry to shape_polygon:
Run the workspace. Check the translation log. Notice that there are 172 warning messages!
Some features are rejected from the output because they are not an area geometry:
WARN |Error - Expected an aggregate or area geometry. WARN |REJECTING BELOW FEATURE: INFORM|Geometry Type: IFMEMultiCurve
Other features are rejected because they have too few points:
WARN |Polygon feature must have at least 4 coordinates...rejecting WARN |REJECTING BELOW FEATURE:
...presumably they are two-point line features.
Some other features are not rejected, but they are an aggregate (group) of polygons joined together that need to be split up:
WARN |Dropping heterogeneous aggregate feature for the ESRISHAPE Writer, due to feature type allowed geometries restriction WARN |Geometry Type: IFMEAggregate
A count further in the log tells us how many features were rejected per file:
WARN |Rejected 3 output features WARN |Rejected 11 output features WARN |Rejected 1 output features WARN |Rejected 2 output features
So FME has fixed some geometry types where it can and rejected others. We also have a count of how many features were rejected.
Additionally (as long as you saved the workspace in step 1), all rejected features are stored in an FFS (FME Feature Store) format dataset as a form of a spatial log:
So I have separated out the invalid features into a new dataset without having to set it up!
If I did wish to do this within the workspace, I could use a GeometryFilter transformer to separate out non-polygon features before they reached the writer.
The data used here originates from open data made available by the City of Vancouver, British Columbia (data.vancouver.ca). It contains information licensed under the Open Government License - Vancouver.
Can you explain how to get the following values?
Thank you.
WARN |Error - Expected an aggregate or area geometry.
WARN |REJECTING BELOW FEATURE:
INFORM|Geometry Type: IFMEMultiCurve
.....................................
.....................................
WARN |Rejected 3 output features
WARN |Rejected 11 output features
WARN |Rejected 1 output features
WARN |Rejected 2 output features
Hi @miguelhacar,
I'm not sure what exactly your question is. Are you encountering a different error?
But from the messages you copied:
WARN |Error - Expected an aggregate or area geometry. WARN |REJECTING BELOW FEATURE: INFORM|Geometry Type: IFMEMultiCurve
WARN |Rejected 3 output features WARN |Rejected 11 output features WARN |Rejected 1 output features WARN |Rejected 2 output features
These are messages found in the log file after creating/running the Locating and Fixing Invalid Geometry Types workspaces in this article. These warnings occur because the geometry of the Shapefile writer is set to polygon and there are features that do not match this schema hence they are rejected.
In order to see the warnings in the log file, you will need to enable this option in FME Options > Translations > Log Message Filter > Log Warnings
@fgiron: Would you be able to provide further help for Miguel?
- Andrea
Hi @andreaatsafe, thanks a lot for letting us know. I'll get in touch with @miguelhacar and see whether I can help him :-)
Data QA: Identifying Non-Consecutive Duplicate Vertices with FME
Data QA: Identifying Duplicate Attribute Values
Data QA Identifying Sliver Overlaps and Gaps in Polygon Coverage
Data QA: Identifying Spikes and Outliers with FME
Data QA: Identifying Bad Topology in Linear Networks
Data QA: Identifying Self-Intersections with FME
Data QA: Identifying Features Closer than a Minimum Distance
Data QA: Identifying Invalid Spatial Relationships
© 2019 Safe Software Inc | Legal