Question

How do I prevent FME from correcting my gml data in the reader?


Badge

I have made a workspace which is used to map geometric errors. My data is stored in the gml-format.

In my workspace I have a FeatureReader, which reads GML data. Issues in my source data can be among other things:

  • Consecutive vertices in lines or polygons
  • Too many coordinates in my point elements
  • Non-closing polygons (last vertex isn’t the same as the starting vertex)

The problem I have with my gml FeatureReader is that these errors are fixed automatically by FME. The consecutive points are removed, the last coordinate of my point element is used and my polygon is closed automatically by adding the first vertex as last coordinate. This is convenient when I want to work with the data without hassle, but in this case I want to know what those issues are so I can correct my source data or report errors.

Is there any way to show the errors corrected by the FME reader or prevent the reader by doing so?

 


14 replies

Userlevel 5
Badge +26

Perhaps this will work: read it as XML instead and then use a GeometryReplacer to build the geometry

Badge

Perhaps this will work: read it as XML instead and then use a GeometryReplacer to build the geometry

Thank you for your suggestion.

This can work for a single gml-file. I have however a lot of gml files which all can have a different element tree where the geometry is stored. It would take a lot of effort to map the structure for all the different gml-files.

Userlevel 5
Badge +26

Thank you for your suggestion.

This can work for a single gml-file. I have however a lot of gml files which all can have a different element tree where the geometry is stored. It would take a lot of effort to map the structure for all the different gml-files.

Actually, I usually just feed the xml_fragment attribute, which basically contains the entire xml chunk for that feature, into the GeometryReplacer and let that sort it all out for me. It seems to work fine in most cases I've come across.
Badge
Actually, I usually just feed the xml_fragment attribute, which basically contains the entire xml chunk for that feature, into the GeometryReplacer and let that sort it all out for me. It seems to work fine in most cases I've come across.

What Geometry Encoding are you using for the GeometryReplacer transformer in this case? Using GML will fix the errors automatically again.

Userlevel 5
Badge +26

What Geometry Encoding are you using for the GeometryReplacer transformer in this case? Using GML will fix the errors automatically again.

You're right, it does seem to automatically fix those errors. I don't really see an easy way to not have it do that.

There is a setting on the GML Reader: Parameters -> Advanced -> Continue on Geometry error, but even if I set that to No it will fix those "small" errors. There is also an option to save the geometry fragment as a separate attribute, you could try comparing that to the resulting geometry. Something as simple as counting the number of values in the poslist there compared to the number of vertices should already give you an indication that fixes have been made.

We have found an arctile by Safe about internal geometry operations, but it is very limited in information and doesn't offer a solution. They speficy that there can be operartions on the geometry in some occasions. However, the list is not extensive.

Perhaps this will work: read it as XML instead and then use a GeometryReplacer to build the geometry

That is a possibility, but with all the possible geometries available in GML also like a lot of work.

You're right, it does seem to automatically fix those errors. I don't really see an easy way to not have it do that.

There is a setting on the GML Reader: Parameters -> Advanced -> Continue on Geometry error, but even if I set that to No it will fix those "small" errors. There is also an option to save the geometry fragment as a separate attribute, you could try comparing that to the resulting geometry. Something as simple as counting the number of values in the poslist there compared to the number of vertices should already give you an indication that fixes have been made.

Thanks for the suggestion. However, the attribute name changes with each GML feature type and the content can get very complex, for example when working with donut holes, there are 2 posList elements in the GML. Using this attribute in a GeometryReplace will indeed again correct the geometry.

Userlevel 4
Badge +25

Do you have a sample dataset? I'm sort of surprised it happens on reading. Generally (I thought) we only make fixes like that when writing data, and only then in cases where the output would otherwise be invalid. I personally don't think we should be making changes to incoming data.

If the data is confidential then please submit it to our support team (safe.com/support) and they can query this with our developers.

Badge

Do you have a sample dataset? I'm sort of surprised it happens on reading. Generally (I thought) we only make fixes like that when writing data, and only then in cases where the output would otherwise be invalid. I personally don't think we should be making changes to incoming data.

If the data is confidential then please submit it to our support team (safe.com/support) and they can query this with our developers.

I couldn't upload the file as gml so I saved it as text:

safe example.txt

In the file are 3 different data features:

1. A point feature with too many coordinates. It is corrected by discarding the first coordinate(s).

2. A polygon where the last coordinate does not match the first coordinate. It is closed automatically by fme (adding the first coordinate as last coordinate).

3. A polygon containing duplicate consecutive coordinates. It is corrected by removing de duplicate.

Badge

Do you have a sample dataset? I'm sort of surprised it happens on reading. Generally (I thought) we only make fixes like that when writing data, and only then in cases where the output would otherwise be invalid. I personally don't think we should be making changes to incoming data.

If the data is confidential then please submit it to our support team (safe.com/support) and they can query this with our developers.

Is this issue still on the radar? We are looking forward to a solution.

Userlevel 4
Badge +25

Is this issue still on the radar? We are looking forward to a solution.

Yes, apologies, I'll look at it later today.

Userlevel 4
Badge +25

OK, so I've been looking into this and getting feedback from our developers.

It's working as designed, in that it was designed - rightly or wrongly - to automatically handle these issues. However, there are a few "settings" that we can tweak that will help a little.

So to explain... as I understand it, when FME reads GML data it has a specification for the GML and reads it into the translation. In my workspace it's denoted with this log message:

Using XSD semantics configuration file 
'file:///C:/Program Files/FME2019/xml/gml_v3.2/gml_config.xml'.

That's for plain GML. If I was reading CityGML instead, it would use a different file. Anyway, that specification tells FME how to handle each type of geometry. There are some settings that apply, one of which is called keep-duplicate-coordinates!

In the CityGML spec, that option is turned on by default. In the regular GML spec, it is turned off. So, I can open that file (gml_config.xml) in a text editor and make some edits.

Firstly I find this section (look for gml:pos):

               <mapping match="gml:pos">
                  <signature name="xfmap-pos-collector"/>
                  <references>
                     <macro:use name="geometry-references"/>
                  </references>                                            
                  <geometry>
                     <macro:use name="geometry-builder-coordinate-system-data"/>
                     <data name="axis-separator">
                        <literal expr="whitespace"/>
                     </data>
                     <data name="coord-separator">
                        <literal expr="|"/>
                     </data>

It starts at line 1804 in my version. Then I add three lines to turn on the option to keep duplicate coordinates...

               <mapping match="gml:pos">
                  <signature name="xfmap-pos-collector"/>
                  <references>
                     <macro:use name="geometry-references"/>
                  </references>                                            
                  <geometry>
                     <macro:use name="geometry-builder-coordinate-system-data"/>
                     <data name="axis-separator">
                        <literal expr="whitespace"/>
                     </data>
                     <data name="keep-duplicate-coordinates">
                        <literal expr="true" />
                     </data>
                     <data name="coord-separator">
                        <literal expr="|"/>
                     </data>

Now I find another section about 40 lines later (look for gml:posList)...

               <mapping match="gml:posList gmlcov:positions">                         
                  <signature name="xfmap-posList-collector"/>
                  <references>
                     <macro:use name="geometry-references"/>
                  </references>                                            
                  <geometry>
                     <macro:use name="geometry-builder-coordinate-system-data"/>
                     <data name="axis-separator">
                        <literal expr="whitespace"/>
                     </data>
                     <data name="coord-separator">
                        <literal expr="whitespace"/>
                     </data>

...and add the same setting there:

               <mapping match="gml:posList gmlcov:positions">                         
                  <signature name="xfmap-posList-collector"/>
                  <references>
                     <macro:use name="geometry-references"/>
                  </references>                                            
                  <geometry>
                     <macro:use name="geometry-builder-coordinate-system-data"/>
                     <data name="axis-separator">
                        <literal expr="whitespace"/>
                     </data>
                     <data name="keep-duplicate-coordinates">
                        <literal expr="true" />
                     </data>
                     <data name="coord-separator">
                        <literal expr="whitespace"/>
                     </data>

Now, when I run the workspace again, duplicate coordinates are kept in the data.

That setting - keep-duplicate-coordinates - is documented here: http://docs.safe.com/fme/2021.0/html/FME_Desktop_Documentation/FME_ReadersWriters/xml/xml_line.htm

You'll notice that there are two other similar settings: demote-incomplete-geometry and allow-incomplete-geometry. They're both false by default so by adding/applying those settings you would set up FME to stop auto-processing other irregularities.

There are similar settings for other geometries - for example polygons.

Now, I don't know if that will account for all of your issues. I don't see a specific setting for multiple coordinates in a point feature, and the incomplete geometry setting for a polygon will still close a polygon without a matching end-coordinate (it will only change to pass a polygon with two or fewer coordinates).

So it's not a total solution here, and applying each of these settings does involve manual editing of these setup files, rather than just turning on a parameter in FME itself. There is some effort and testing involved.

The alternatives would be:

  • I file a request (with our development team) to add more settings to do what you want
  • I file a request to expose these settings as parameters
  • I file a request for the XMLValidator transformer to validate geometry as well as syntax/schema
  • You set up a solution that ignores these settings. It would need a workspace created to read the GML file (using a textfile reader to preserve the original content) and to parse the content to look for issues. I can see this being a worthwhile project, but perhaps not one that you want to embark on.

Obviously me filing a request doesn't necessarily guarantee if it's going to get implemented, or when. So that's more of a long-term strategy.

I hope this is of some use. Let me know if you need further help with the settings above, or trying one of the alternative solutions. I'll also suggest that you contact my colleague Dean through the support team (safe.com/support) if you get really stuck. He's our XML expert and he may have other insights that I don't.

Regards

Mark

OK, so I've been looking into this and getting feedback from our developers.

It's working as designed, in that it was designed - rightly or wrongly - to automatically handle these issues. However, there are a few "settings" that we can tweak that will help a little.

So to explain... as I understand it, when FME reads GML data it has a specification for the GML and reads it into the translation. In my workspace it's denoted with this log message:

Using XSD semantics configuration file 
'file:///C:/Program Files/FME2019/xml/gml_v3.2/gml_config.xml'.

That's for plain GML. If I was reading CityGML instead, it would use a different file. Anyway, that specification tells FME how to handle each type of geometry. There are some settings that apply, one of which is called keep-duplicate-coordinates!

In the CityGML spec, that option is turned on by default. In the regular GML spec, it is turned off. So, I can open that file (gml_config.xml) in a text editor and make some edits.

Firstly I find this section (look for gml:pos):

               <mapping match="gml:pos">
                  <signature name="xfmap-pos-collector"/>
                  <references>
                     <macro:use name="geometry-references"/>
                  </references>                                            
                  <geometry>
                     <macro:use name="geometry-builder-coordinate-system-data"/>
                     <data name="axis-separator">
                        <literal expr="whitespace"/>
                     </data>
                     <data name="coord-separator">
                        <literal expr="|"/>
                     </data>

It starts at line 1804 in my version. Then I add three lines to turn on the option to keep duplicate coordinates...

               <mapping match="gml:pos">
                  <signature name="xfmap-pos-collector"/>
                  <references>
                     <macro:use name="geometry-references"/>
                  </references>                                            
                  <geometry>
                     <macro:use name="geometry-builder-coordinate-system-data"/>
                     <data name="axis-separator">
                        <literal expr="whitespace"/>
                     </data>
                     <data name="keep-duplicate-coordinates">
                        <literal expr="true" />
                     </data>
                     <data name="coord-separator">
                        <literal expr="|"/>
                     </data>

Now I find another section about 40 lines later (look for gml:posList)...

               <mapping match="gml:posList gmlcov:positions">                         
                  <signature name="xfmap-posList-collector"/>
                  <references>
                     <macro:use name="geometry-references"/>
                  </references>                                            
                  <geometry>
                     <macro:use name="geometry-builder-coordinate-system-data"/>
                     <data name="axis-separator">
                        <literal expr="whitespace"/>
                     </data>
                     <data name="coord-separator">
                        <literal expr="whitespace"/>
                     </data>

...and add the same setting there:

               <mapping match="gml:posList gmlcov:positions">                         
                  <signature name="xfmap-posList-collector"/>
                  <references>
                     <macro:use name="geometry-references"/>
                  </references>                                            
                  <geometry>
                     <macro:use name="geometry-builder-coordinate-system-data"/>
                     <data name="axis-separator">
                        <literal expr="whitespace"/>
                     </data>
                     <data name="keep-duplicate-coordinates">
                        <literal expr="true" />
                     </data>
                     <data name="coord-separator">
                        <literal expr="whitespace"/>
                     </data>

Now, when I run the workspace again, duplicate coordinates are kept in the data.

That setting - keep-duplicate-coordinates - is documented here: http://docs.safe.com/fme/2021.0/html/FME_Desktop_Documentation/FME_ReadersWriters/xml/xml_line.htm

You'll notice that there are two other similar settings: demote-incomplete-geometry and allow-incomplete-geometry. They're both false by default so by adding/applying those settings you would set up FME to stop auto-processing other irregularities.

There are similar settings for other geometries - for example polygons.

Now, I don't know if that will account for all of your issues. I don't see a specific setting for multiple coordinates in a point feature, and the incomplete geometry setting for a polygon will still close a polygon without a matching end-coordinate (it will only change to pass a polygon with two or fewer coordinates).

So it's not a total solution here, and applying each of these settings does involve manual editing of these setup files, rather than just turning on a parameter in FME itself. There is some effort and testing involved.

The alternatives would be:

  • I file a request (with our development team) to add more settings to do what you want
  • I file a request to expose these settings as parameters
  • I file a request for the XMLValidator transformer to validate geometry as well as syntax/schema
  • You set up a solution that ignores these settings. It would need a workspace created to read the GML file (using a textfile reader to preserve the original content) and to parse the content to look for issues. I can see this being a worthwhile project, but perhaps not one that you want to embark on.

Obviously me filing a request doesn't necessarily guarantee if it's going to get implemented, or when. So that's more of a long-term strategy.

I hope this is of some use. Let me know if you need further help with the settings above, or trying one of the alternative solutions. I'll also suggest that you contact my colleague Dean through the support team (safe.com/support) if you get really stuck. He's our XML expert and he may have other insights that I don't.

Regards

Mark

@mark2atsafe thanks for looking into it. At least we now have an understanding of a part of the issue. We will need to investigate with the rest of the organization how to resolve this. 

Reply