Solved

Join based on two fields + maximum difference between distance value in join field

  • 14 February 2020
  • 6 replies
  • 4 views

Badge +4

I have two datasets which need to joined. Both datasets store the attributes 'CI_nummer' (numeric) and 'NaamObject'. An example value of the last looks like: ZN VKS MSI A16L 46,935 1

I want to join the datasets if both these attributes match. . Beside this, the NaamObject also needs to match for a certain minimum of characters.

ZN VKS MSI A16L 46,935 1

46,935 these numbers represent a distance value. 46 kilometres and 935 meters.

When the difference between two distance values (dataset A versus B) is less than 500 meters I want FME to see these values as identical. For example:

These two values should be considered as identical:

ZN VKS MSI A16L 46,935 1

ZN VKS MSI A16L 46,835 3

These two values should be considered as identical

N VKS MSI A16L 46,935 1

ZN VKS MSI A16L 47,122 3

These two values should not be considered as identical

N VKS MSI A16L 46,935 1

ZN VKS MSI A16L 48,122 3

I am thinking about something with a regular expression? Any ideas how to solve this issue?

 

 

 

icon

Best answer by ebygomm 14 February 2020, 10:55

View original

6 replies

Userlevel 1
Badge +21

Assuming the format is consistent, you could split by space and then create new attributes where the number is separated

Depending on your input you could then join on the first attribute and then test for the difference in numbers between the two joined attributes and discard any where the difference is greater than 500m.

Otherwise you'll probably need to use an inlinejoiner to do the range match

Badge +4

Assuming the format is consistent, you could split by space and then create new attributes where the number is separated

Depending on your input you could then join on the first attribute and then test for the difference in numbers between the two joined attributes and discard any where the difference is greater than 500m.

Otherwise you'll probably need to use an inlinejoiner to do the range match

Thanks for your reply! It sounds promising. I am going to try your suggestion.

Badge +4

Assuming the format is consistent, you could split by space and then create new attributes where the number is separated

Depending on your input you could then join on the first attribute and then test for the difference in numbers between the two joined attributes and discard any where the difference is greater than 500m.

Otherwise you'll probably need to use an inlinejoiner to do the range match

It works perfect! :) However, why not creating just 'attr2' and populate it with _list{4} ?

Userlevel 1
Badge +21

It works perfect! :) However, why not creating just 'attr2' and populate it with _list{4} ?

I swapped the comma for a decimal point which would be necessary for doing a numeric comparison with my locale settings. I can't recall whether FME will accept a comma as the decimal separator dependent on the system settings.

Badge +4

I swapped the comma for a decimal point which would be necessary for doing a numeric comparison with my locale settings. I can't recall whether FME will accept a comma as the decimal separator dependent on the system settings.

I get it. Is there also a way to check the consistency of my data?

This would mean that every value of het attribute NaamObject should have the same number of spaces? Maybe you could check for the length of the list?

Userlevel 1
Badge +21
I get it. Is there also a way to check the consistency of my data?

This would mean that every value of het attribute NaamObject should have the same number of spaces? Maybe you could check for the length of the list?

Yes, checking the length of the list would confirm the number of spaces in the attribute. You could also test for contains regex with a more specific match, e.g. 2 letters space 3 letters space 3 letters space etc.

Reply