Word Frequency Transformer

Related products: FME Form General

I was just thinking it would be great if I could determine the word frequency of some text held in an attribute.

I bet there is a python function to do that or a number of different transformers could be put together to figure it out. Alas, I'm a simple lizard with limited time and knowledge...

Maybe someone can whip something up for me? Oh! Oh! and maybe add it to the FME Hub for all to enjoy? That would be great!

Great idea. I think the challenge should include a sample text with a given result to aim for. The text should include stuff like international characters, apostrophes, hyphens etc. to make sure non-trivial edge cases are treated correctly.

 

Possible starting point: http://www.nltk.org/

This would be really easy to implement with just two standard FME Transformers.

Use an AttributeSplitter to get every word in your text as a list. After that you can use a ListHistogrammer to get the actual statistic.


Well, you would need a couple of extra transformers to exclude some not very wordy characters before calculating the word frequency (commas, dots, brackets, inverted commas, etc). Putting back together words split in two lines would also be nice.

 

Anyway, what is a word?
https://www.sussex.ac.uk/webteam/gateway/file.php?name=essay---what-is-a-word.pdf&site=1

 


 


I vote for this one and at the same time this WordCloud idea :


https://knowledge.safe.com/idea/51622/wordcloud-transformer.html

 

 

I think they are related in many ways. To create WorldClouds we would really need wordcounting. So a nice and fast WordCounter where exceptions can be included in a nice way for common, not important words would rock.


FME Hub transformer UniqueValueLogger may be of assistance here too... Lots of great idea on how to implement!


...and/or something to read a text file/Word document for the number of times a word or phrase (or sentence) repeats. Having put together a lengthy python script for this in the past it would be great!


I can't put it on the hub, but anyone is welcome to develop this custom transformer with my blessing.

 


 

wordcounter.fmx
See the FME Challenge for this idea:

 

https://knowledge.safe.com/questions/54342/word-frequency-challenge.html