Latest (post-release) FME 2009 or FME 2010 is needed to run the attached workspaces!
This scenario shows how to prepare raster tiles from vector data for web mapping platforms. The scenario uses
Microsoft Bing Maps tile system, however, results can be used with Google Maps and other mapping platforms.
Bing Maps (aka Virtual Earth) is a powerful and flexible tool to serve data over the Internet. FME can play an essential role in preparing data for Bing Maps. I have already published a scenario that explains how to convert raster data into Bing Maps layers of raster tiles.
What I find very convenient about Bing Maps tiles is that we always know tile and pixel sizes in ground units, which are constant throughout whole Earth at each zoom level. This is defined by the Spherical Mercator projection. It allows to set some parameters in an easy way.
Now imagine, we would like to take some vector data and display it in Bing Maps or Google Maps (note that both links point to the same tile set). There are several ways of doing this; in this article I will speak about just one way – rasterization and making Bing Maps tiles.
This article implies that you know some basics about how Bing Maps works, what are tiles, quadkeys, and zoom levels.
As a source map I used contour data from
VMap level 0.This dataset has worldwide coverage and detail level of 1:1,000,000 map.
The contours give us an interesting challenge – if we display all the lines at every zoom level, at lower zoom levels contours will simply cover the background map in the mountain areas entirely, so we have to invent an algorithm that would take care about selecting contours of certain elevations for each zoom level. For example, 1000 m contour should be seen at any zoom level, whereas 100 m contour only when we zoom in closer than level 7.
On the other hand, it can be quite useful to see all the contours to get the feeling of the Earth relief. Besides, keeping all the contours is a wonderful stress test for FME Desktop and FME Server, so I publish both variants.
We also would like to achieve the highest quality of the raster tiles at each zoom level. This means that we cannot resample tiles created for one zoom level when we go to the next one – rasterization should happen separately for each level.
Have a look at the illustration below. We can see how the the raster tiles fade out when WebMapTiler makes tiles from a source raster (the row above). When we make vector tiles and rasterize them separately, we have consistent quality across all zoom levels, plus, we have control over the contents at each level (the row below):
There are 19 zoom levels in Virtual Earth. Each new layer has four times more tiles than the previous one. If we would like to go down to zoom levels 14-15, we should get dozens of millions of tiles, and this is why I am also going to talk about performance challenges that we had to overcome. The process shows how to use the full power of FME with spatially enabled database and FME Server.
The source data in VPF format can be downloaded from the
National Geospatial-Intelligence Agency (NGA) web site. For your convenience, I downloaded and converted data into FFS format. The full archive can be downloaded from the attached World_Contours.zip. Uncompressed, the full dataset occupies over 600 megabytes. The data contains contours with 1000 feet interval (in some places 500 feet and a few other contour lines with different intervals).
The best way to store and retrieve such amounts of data for our purpose is in a spatially enabled database. In my example, I used
PostGIS database (the workspace converting FFS to PostGIS is attachment ffs2postgis.fmw).
For those, who are scared of databases, I can tell that it took me just above an hour to download, install, and be able to run PostgreSQL and PostGIS on top of it, despite the fact that I am not a database specialist at all.
Preparing Vector Data
"Smart" Data Selection
While preparing data for rasterization, I added a filter, that selects data depending on contour interval, so that say zoom level 14 has all contours, zoom level 13 – only divisible by 10, level 12 will have contours divisible by 20, etc.We can use testers for that, but sometimes a simple expression in ExpressionEvaluator does a perfect job: @Value(_elevation)%1000==0?8:(@Value(_elevation)%500==0?9: (@Value(_elevation)%200==0?10:(@Value(_elevation)%100==0?11: (@Value(_elevation)%20==0?12:(@Value(_elevation)%10?13:14)))))
This way we calculate the highest (smallest number) zoom level at which a contour should be seen.
After that, a little loop will allow us to make several set of contours for each zoom level.
So the first set would contain only a thousandth contour, the second - 1000th and 500th, third, 1000th, 500th and 200th, and so on.
Gradient ramp is a good choice for a wide range of values such as contour elevations. Setting colors with a PythonCaller is a fast and easy way to assign fme_color attribute. Here is an example of the python code used in the attached workspaces:
Usually, FME makes Bing Maps tiles from an incoming raster with the WebMapTiler transformer. This approach may work not so well with vector datasets. Rasterizing a map down to zoom level 10 would produce an image with 262,144 pixels along each side, and those rasters are a bit hard to handle. We could rasterize smaller portions of the dataset and then use WebMapTileron those smaller rasters, but this will involve two raster operations (rasterization itself and raster tiling). Besides, if we pick an arbitrary grid, we may end up with incomplete tiles, and will have to take care about mosaicking them together. After some tests I gave up this route and decided to try a direct clipping and rasterization to Bing Maps tiles. That is, the ImageRasterizertransformer should get exact portions of data that are needed to make separate tiles. So now we have only one raster transformer. Another positive side effect of this change is that now we won't produce tiles that don't have any data.
How can we make such portions? The answer is easy – the
Clipper transformer. We can make vector tile polygons that serve as Clippers, and the dataset supplies Clippees. That was the theory. If we tile our world from zoom level 1 down to level 10, we have to create almost 1,500,000 Clippers. That’s a lot. Despite all the improvements, we still have some performance problems when the number of Clippers exceeds 20,000-60,000 (depending on the amount of clipped data).
I tried replacing Clipper with Intersector, which allowed me to go two zoom levels further than Clipper, however, it required some extra work on computing quadkeys, and still, it could not be used for even deeper tiling.
The next logical step in my approximations to the final workflow was making a procedure consisting of two (or more) steps. With step one, I would tile the original dataset into smaller datasets with the extents matching the tile extents of some zoom level (say, 7 or 8 as in my tests), and then make the same tiling and final rasterization for each of those smaller datasets down to zoom levels 14-15.
In fact, there shouldn’t be too much need in a process that would take some data and tile it across the whole zoom level range from level 1 (world) to level 19 (mole-hill). The map of the world and street map have quite different contents, and even if some elements are the same (coast line, for example), they are usually represented by geometries that are quite different - generalized or detailed.
For managing such a workflow, I had to set up the following procedure. At the first stage, the original dataset is cut into smaller portions and stored in some convenient format (preferably with spatial index - an FFS or some spatial database). There are two workspaces at the second stage, one executes another with
WorkspaceRunneror ServerJobSubmitter transformer.
The first workspace decides what portion (what file or database table) of the data should be taken, the second workspace knows how to tile and rasterize whatever is entering it.
Still, the necessity to generate and keep multiple smaller datasets instead of one large dataset is not an ideal solution – it is harder to manage data storage and distribution, easier to loose some parts, etc. This is why a spatial database comes out to play a more significant role in the whole process.
Spatial databases such as Oracle, SQL Server, MySQL or PostGIS are very effective at quick reading subsets of spatial data. If we need to extract some data from a traditional vector format such as MapInfo or DGN, we would have to read the whole dataset, add a polygon that would define the boundaries of the area of interest, and then, with either Clipper or one of the Overlayers (e.g. AreaOnAreaOverlayer)get the data. With database readers getting the same result is just a matter of setting reader parameters, which can be published, and hence, set from an external workspace.
The final workflow published here looks as follows.
We take a feature that represents our area – it can be a country area or simple bounding box. We ask for the parameters – minimum and maximum zoom levels (from which to which zoom level data should be seen), and we also have to specify so called 'tiling level' (I used to call it reading, but it is not very good word for the purpose it is used for).
Using Rasterizer we make a small raster (1*1 or maybe 10*10 pixels to avoid any rounding problems), and pass the this raster through WebMapTiler. We get VE tiles, which know their quadkey names.
Then with BoundsExtractor, we get minimum and maximum X and Y, and they together with the quadkey serve as the parameters to the second workspace.
In the example below, we got 28 VE tiles, what means that we will execute 28 workspaces from zoom level 5 to zoom level 7 (which is that tiling level). BTW, we have someone on Madagascar transforming TIFF to BMP with FME, and maybe at the next conference I’ll tell you how we know it, how we collect and how we use statistics about our users.
Now what do we do in those 28 workspaces? We apply parameters (tile extents) sent from the external workspace to the database reader. With option 'Clip' we get only data that is inside of quadkey we submitted.
After that we do our regular job – coloring and scale selection if needed.
Then, we either simply rasterize with group by what we read into a tile and save it as a PNG file, or we perform additional tiling.
Why some of the tiles are written immediately, and some are tiled again? This depends on tiling level. Again, talking about the image above – data read with the tiles of the zoom levels 5 and 6 is rasterized and written right away.
Data read with tiles of zoom level 7 is also rasterized directly, but also before that, is tiled, to make tiles of zoom levels 8 and 9.
This all gives us great flexibility. Now we can control how many workspaces will be executed, and how many tiles each workspace will make. When we increase the number of tiles (by increasing tiling level number) in the first workspace, we increase the number of workspace that will be executed, which means that each workspace will have less to do. Or, if we submit less tiles, that means less workspaces, and more job (more tiles) per workspace.
Results on FME Desktop - More Workspaces or More Tiles per Workspace?
The diagram below shows tiling up to (or down to) zoom level 8. When I tried to tile everything with one workspace, I had to give up after 36 hours – I didn't see any work going. Either my FME stalled, or I wasn't patient enough – the rest of the numbers tells us that further waiting had no sense.
When the workspaces don't have to make that many clippings, the situation starts to improve. 4000 tiles per workspace seem to be doing much better than 16,000, and the interval between 64 and a thousand tiles per workspace seems to be the optimum. When we have too many workspaces, the necessity to pass data through all other transformers outweighs any advantages of the faster clipping.
Tiling Level # of Workspaces Tiles per wksp
1 1 16384
2 16 4096
3 64 1024
4 320 256
5 1344 64
6 5440 16
7 21824 4
If we need really lots of tiles, the process can go quite slow. Depending on different conditions, my machine was able to produce between 6 and 10 tiles per second, and that means very roughly ~500,000 - 850,000 tiles a day. However, big project, say Canada-wide, with detailness down to zoom level 14-15 would require millions and millions of tiles.
The workflow explained above fits really well into FME Server technology.
In fact, all what has to be done (assuming that FME Server is up and running) is to replace WorkbenchRunner with ServerJobSubmitter. This transformer will take the same parameters, and once changes are made, we are ready to make tiles with FME Server.
More Machines or More Engines?
Initially, we tried a single FME Server quad core machine with a single engine and the results were pretty close to those we got on a machine with FME Desktop, and that was expected. Then we thought, if we put four engines on that quad core machine we would see real improvements in performance. However, we got worse performance with many engines than with just one (columns 1 and 5):
Our next couple of tests compared performance on four machines with single and then with two engines on each. As you can see, results are really close – both around 2.5 hours (columns 2 and 4).
The fastest result was reached with 8 single engine machines. Perhaps, we would have the same for two engine machines, but when you pay for an engine there is probably not much sense to have twice as more to get about the same result.
Why is it so? We checked Task Manager for CPU consumption, and for the most part of the process it was quite low. After some consultations with more technically experienced Safers, we agreed that the bottleneck is I/O – we simply can't write more files in a given time period, so multiple engines will just wait when they will get their chance to write something, but the outcome will be the same or worse – just because we make such a crowd.
FME Server Tiling Performance
Then we made some simple performance tests to get the idea how fast we can generate our tiles. We tiled the VMap contours down to zoom level 8. For this we ran our tiling process with 1, 2, 4, and 8 machines, and as you can see, we decreased out time from two hours to less that 20 minutes.
Then we tried how fast we can be if we go to level 10 and 11, and the table shows the numbers. 250,000 tiles can be done in about 2 hours, and one million tiles was completed in 6.4 hours. On a single machine, it would take almost two days.
Level Tiles Minutes (hours)
8 24,500 18 (0.3)
10 245,000 105 (1.75)
11 1,000,000 384 (6.4)
Now it is easy to imagine that tiling a big area with multiple layers down to zoom level 14 or 15 can take either very long time or a lot of computing resources.
So, if we had unlimited number of computers how fast we could reach our goal?
This graph does not deal with total time; it shows the average and the maximum time per workspace.
The process illustrated with this graph generates about 3500 tiles, taking about half an hour for a single workspace. When the number of workspaces grows, an average and maximum times are going down. With 82 workspaces, that is, with 82 engines, this process can finish in half a minute (if there is no other bottle necks).
FME Server now can work in
the cloud, and the tests described above were also performed by our colleagues at
It is really easy to start making tiles with FME. Only FME Desktop is needed to start making them as long as the number of tiles does not exceed hundreds of thousands or a few millions of tiles.
The results of the tiling process are not limited to a single mapping platform Bing Maps. It can be used with Google Earth and other web mapping platforms.
The general idea how to split data processing into multiple workspaces is not limited to the described process and as such, is even more valuable.
FME Server allows much quicker data production, so if your volumes are big or you need frequent updates, this is the way to go.
Even if FME Server not enough, FME now lives in a cloud, which can scale out really wide.
Data, Workspaces, Links
World_Contours.zip - Source data in FFS format
ffs2postgis.fmw - Workspace to translate FFS to PostGIS
RasterizerRunnerPostGIS.fmw - Master Workspace to execute secondary workspaces