span8
span4
span8
span4
When using FME Cloud, one of the biggest hurdles customers face is moving their datasets into the cloud—particularly if they are large. Moving data to the cloud can be broken down into three distinct groups, each group comes with a different set of problems, and therefore a different recommended approaches for loading data:
Since FME Cloud runs on AWS, most scenarios we’ve seen involve loading data up to Amazon Web Services (AWS), but these techniques also apply to loading data into other cloud platforms such as Microsoft Azure.
This is the easiest scenario as network bandwidth won't be a constraint, what defines small really depends on your available bandwidth. To upload small datasets at frequent intervals, you can leverage a number of tools provided by AWS and third parties. The following solutions all use HTTP, which should be sufficient unless your internet connection is really unreliable.
Loading data into AWS S3
Loading data into RDS
A database running on RDS has exactly the same interface as a database running on-premises. That means you can use standard tools to load data in.
Loading data into S3 and EBS volumes
Network bandwidth is often a constraint when loading large datasets into the cloud. For one time bulk uploads you can use the
AWS Import/Export Snowball service or the similar service offered by Azure. You load your data onto SSD disks and post them to Amazon, then they load your data into a nominated S3 bucket or EBS mount. It is an excellent way to do a bulk upload if you plan on doing change-only updates.
If loading your data over the network would take 7 days or more, definitely consider using AWS Import/Export. First, it’s
cost effective, as you don’t have to pay for bandwidth costs, only a handling fee and $2.49 per loading hour. Second, it’s secure: you can use pin code and software encryption to ensure your data is secure when it is in transit. Finally, your data is guaranteed to load within 1 business day of receiving it, so it is a relatively fast way to load large datasets.
Loading data into RDS
A database running on RDS has exactly the same interface as a database running on-premises. That means you can use standard tools to load data in.
This relatively common scenario is the trickiest of all three. What makes it tricky is the frequency, high frequency means you really have to use the network as the AWS import/export is too slow.
To upload large volumes of data the standard tools AWS can be too slow—even if you have a fast internet connection—as they all rely on HTTP. There is overhead with HTTP because it relies on the TCP protocol, which simply wasn’t designed for moving large datasets across the WAN.
Accelerated file transfer solutions have come to market
that leverage UDP, claiming they can facilitate much greater throughput by
using more of your available bandwidth as they are less affected by network
overhead. Several accelerated file transfer solutions exist, and I ran a series of benchmarking tests and found that the the overhead for HTTP wasn't as big as an issue as I thought. Result are available in this blog post.
The greatest benefit I identified was reliability, and
essentially turning file upload into a fault-tolerant component. Often
we design complicated fault-tolerant architectures in the cloud,
leveraging all AWS has to offer to ensure we have a stable, reliable
application. However, such a design is only as strong as the weakest
link. If you are relying on data being uploaded to the cloud to trigger a
workflow, my bet is that it is likely to be the weakest link. If
uploading files is an integral part of your workflow, I suggest taking a
look at the commercial accelerated file transfer solutions.
Working with Amazon S3 and FME
Using and configuring S3 with FME Cloud
Using and configuring SQS with FME Cloud
Using and configuring RDS with FME Cloud
Getting Started with FME Cloud: Events
How to keep the OS of your FME Cloud instance up to date
Getting Started with FME Cloud: Sign Up
Tutorial: Getting Started with FME Cloud
FME Cloud: How to speed up your workflows with the temporary disk
© 2019 Safe Software Inc | Legal