Solved

Remove multiple underscores that separates words

6 years ago
18 August 2017
10 replies
113 views

geospatiallover
70 replies

I'd like my regex to replace the middle underscores with a space. I cannot seem to escape the underscore character to use with curly brackets for repetitions.

I have strings with multiple underscores and spaces as word separators. I'd like my result to look like below:

word1_worda wordb wordc wordd_word3

Sample strings look like:

sample1: Autobooster 1_Autobooster Unit_Unit_Autobooster Unit

sample2:Anchor_FIELD_VERIFIED_DATE_NonDisplay

With my format these samples should result to:

sample1: Autobooster 1_Autobooster Unit Unit_Autobooster Unit

sample2:Anchor_FIELD VERIFIED DATE_NonDisplay

I used Lookahead and lookbehind and what I'm getting is the middle string with the underscores between word 1 and word 2 and word 2 and word 3 before and after the result. Regex below

\\w(?<=_).*_?\\w(?<=_)

Test string looks like below

If I were to finish my translation I have to use StringReplacer, StringConcatenator, and then merge them back again.

icon

Best answer by courtney_m 18 August 2017, 17:01

View original

10 replies

Userlevel 1

+21

ebygomm
Contributor
3079 replies
6 years ago
18 August 2017

So you want all but the first and last underscores replaced with spaces?

+10

salvaleonrp
Contributor
91 replies
6 years ago
18 August 2017

So you want all but the first and last underscores replaced with spaces?

Yes, that's the ideal.

courtney_m
125 replies
6 years ago
18 August 2017
Best Answer

I was able to do this with 2 StringSearchers and an AttributeManager:

The first StringSearcher finds the first word and underscore by searching the text for the Regex ^[^_]*_ and saving it as _first_word:

The second string searched finds the last word and underscore by searching the text for the RegEx _[^_]*$ and saving it as _last_word:

Then, in the attribute manager, I created the _middle_text attribute by trimming _first_word off the left of the text, trimming _last_work off the right of the text, then replacing _ with a space. I used the following notation:

@ReplaceString(@TrimRight(@TrimLeft(@Value(text_line_data),@Value(_first_word)),@Value(_last_word)),"_"," ")

Then, I created the final_text attribute by concatenating the attributes _first_word, _middle_text, and _last_word. Finally, I removed the un-needed attributes.

From inspector, you can see what the value of final_text is....

I have also attached the workspace, if you want it. I hope this helps!

-Courtney

+10

salvaleonrp
Contributor
91 replies
6 years ago
18 August 2017

I was able to do this with 2 StringSearchers and an AttributeManager:

The first StringSearcher finds the first word and underscore by searching the text for the Regex ^[^_]*_ and saving it as _first_word:

The second string searched finds the last word and underscore by searching the text for the RegEx _[^_]*$ and saving it as _last_word:

@ReplaceString(@TrimRight(@TrimLeft(@Value(text_line_data),@Value(_first_word)),@Value(_last_word)),"_"," ")

Then, I created the final_text attribute by concatenating the attributes _first_word, _middle_text, and _last_word. Finally, I removed the un-needed attributes.

From inspector, you can see what the value of final_text is....

I have also attached the workspace, if you want it. I hope this helps!

-Courtney

Thanks @courtney_m for explaining and providing your workspace. I appreciate that.

Userlevel 4

+30

danilo_fme
Evangelist
1882 replies
6 years ago
18 August 2017

I was able to do this with 2 StringSearchers and an AttributeManager:

The first StringSearcher finds the first word and underscore by searching the text for the Regex ^[^_]*_ and saving it as _first_word:

The second string searched finds the last word and underscore by searching the text for the RegEx _[^_]*$ and saving it as _last_word:

@ReplaceString(@TrimRight(@TrimLeft(@Value(text_line_data),@Value(_first_word)),@Value(_last_word)),"_"," ")

Then, I created the final_text attribute by concatenating the attributes _first_word, _middle_text, and _last_word. Finally, I removed the un-needed attributes.

From inspector, you can see what the value of final_text is....

I have also attached the workspace, if you want it. I hope this helps!

-Courtney

That is a powerful article. Great! @courtney_m

Thank you, @danilo_inovacao!

Thanks @courtney_m for explaining and providing your workspace. I appreciate that.

You're very welcome, @salvaleonrp. I'm glad I could help.

+10

salvaleonrp
Contributor
91 replies
6 years ago
18 August 2017

The accepted answer is good enough but I wonder if there's a single regex string to remove the extra underscores. Any takers?

Userlevel 2

+17

takashi
Contributor
7538 replies
6 years ago
19 August 2017

@salvaleonrp, accept your challenge: "single regex string to remove the extra underscores"

[2017-08-20: Update] Simplified the regex.

Use a StringReplacer with these parameters.

Mode: Replace Regular Expression
Text To Replace: (?<=_)(.*?)_(?=.*_)
Replacement Text: \1<space>

This string expression set to a transformer parameter works as well. Assume a feature attribute called "text" contains the source text string.

@ReplaceRegEx(@Value(text),(?<=_)(.*?)_(?=.*_),\1 )

Another thought:

1. StringSearther: Split the source text into 3 parts.

Contains Regular Expression: ^(.*?_)(.*_.*)(_.*)$
Subexpression Matches List Name: _sub

2. StringReplacer: Replace every underscore in the middle part with space.

Attributes: _sub{1}.part
Mode: Replace Text
Text To Replace: _
Replacement Text: <space>

3. StringConcatenator etc.: Simply concatenate the three elements of "_sub{}.part" list.

@Value(_sub{0}.part)@Value(_sub{1}.part)@Value(_sub{2}.part)

The replacement and concatenation can also be performed with a single string expression.

@Value(_sub{0}.part)@ReplaceString(@Value(_sub{1}.part),_," ")@Value(_sub{2}.part)

+10

salvaleonrp
Contributor
91 replies
6 years ago
21 August 2017

@salvaleonrp, accept your challenge: "single regex string to remove the extra underscores"

[2017-08-20: Update] Simplified the regex.

Use a StringReplacer with these parameters.

Mode: Replace Regular Expression
Text To Replace: (?<=_)(.*?)_(?=.*_)
Replacement Text: \1<space>

This string expression set to a transformer parameter works as well. Assume a feature attribute called "text" contains the source text string.

@ReplaceRegEx(@Value(text),(?<=_)(.*?)_(?=.*_),\1 )

Another thought:

1. StringSearther: Split the source text into 3 parts.

Contains Regular Expression: ^(.*?_)(.*_.*)(_.*)$
Subexpression Matches List Name: _sub

2. StringReplacer: Replace every underscore in the middle part with space.

Attributes: _sub{1}.part
Mode: Replace Text
Text To Replace: _
Replacement Text: <space>

3. StringConcatenator etc.: Simply concatenate the three elements of "_sub{}.part" list.

@Value(_sub{0}.part)@Value(_sub{1}.part)@Value(_sub{2}.part)

The replacement and concatenation can also be performed with a single string expression.

@Value(_sub{0}.part)@ReplaceString(@Value(_sub{1}.part),_," ")@Value(_sub{2}.part)

Awesome! Learned something new today and valuable in the future. I used AttributeManager and the ReplaceRegex for a new attribute. Thanks @takashi. I have a better understanding of look ahead and look behind now.

Remove multiple underscores that separates words

10 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded