Hadoop

Related products: FME Form

Support reading/writing to HDFS and Hive and others -- indicate in the comments your ideas.
Reading and writing Avro and Parquet file formats natively would be very useful for any integration work. The same is also true of direct HDFS access so those files could be stored directly to the cluster. /Mats 😺

Just yesterday had a request from a customer to be able to 'write' HDFS. Will post more details when we discover more.


I confirm, very usefull to work on Hadoop HDFS or Spark. It is time (or late?) to enter in Big Data world. Amazone cloud is not sufficient. We are benchmarking ETL and this is one criteria.


Safe PR#60154

 

 


This idea is a bit broad right now and I'd suggest splitting out related Hadoop requests into their own ideas. But the HDFS read/write is now in FME 2018 betas via the HDFSConnector transformer. Give it a spin via http://www.safe.com/beta and let us know what you think.


This would be very challenging, but your workflows are extremely similar to what one "would like to build" in Spark / Hadoop. I think it would be amazing to be able to run huge transformations (millions / billions/trillions of records) in Spark / Hadoop natively using the FME GUI to design the workflow and FME server to kick off / manage the Spark / Hadoop jobs.

I.e. Each reader/writer could read/write from Hadoop exactly as it does now from hdfs (for common spacial, xls, etc types), in addition to supporting the more Hadoop specific type files (Map, Sequence, Avro, etc). And then each transformer could be a step in the Spark / Hadoop workflow. (There is a performance hit, but one can run Python directly in Spark / Hadoop. And it seems like Python is what backs quite a bit of FME. Java/Scala would be preferable, but Python would get the job done in most cases... And then one could optimize parts natively, like joins.)

I know at least one large company that would buy FME if it supported Hadoop in this way... (I realize that this goes way beyond the mapping space, but I've seen a company spend millions of dollars trying to create what FME does but running on top of Hadoop. I've used Ab Initio, DataStage, and PentaHO and none compare to the user friendliness of FME. They are all too complex, they should focus on input / simple translations / output like FME and they would be radically better. And if you want something more complex string multiple "workspaces" together.)

Likely tl;dr, but just some observations being on multiple sides of this business.


    Needed
    to compose you a very little word to thank you yet again regarding
    the nice suggestions you’ve contributed here.

    hadoop-training-institute-in-chennai