span8

span4

span8

span4

- Home /
- *FME Desktop /

Article created with
FME Desktop 2016.1

**Intro: Getting Started with the RCaller | Next: RCaller: Is Tree Height and Tree Width Correlated?**

If you need to perform more advanced statistics than is available in the StatisticsCalculator transformer, the RCaller transformer makes much more advanced statistical analysis possible in FME. The RCaller gives you the ability to run R scripts in FME.

rcallerlinearregressionwithgroups.fmwt

Before you can use the RCaller you need to install the appropriate R packages. See the section on Installing the R Interpreter in the FME User Documentation.

Before you get going with the RCaller there's a couple of useful things to remember:

- R doesn't like UNC path names so you can't run a FME Workspace stored on a UNC path, i.e.: \\myprojects\fmeWorkspaces. You have to be running your FME workspaces on a mapped drive, i.e: f:\myprojects\fmeWorkspaces
- Do a bit of reading about R, if you're not already familiar with the concepts. This is a pretty good tutorial. More resources are listed at the end of this article.

FME adds new ports to the RCaller as you connect transformers or feature types to the Connect Input port. The new input port will inherit its name from the source object (i.e the transformer name or the Feature Type Name).

The port names are used as the data frame names in R, so rename the port names to something you'll be able to use in your R scripts.

FME loads your data into a temporary SQLite database, so for both performance and clarity, only select the attributes you're going to use in your R scripts. Make sure the data types are correct.

FME transfers the data into R as a R data frame. You can access the data frame or data frame columns in your R script by dragging items from the Data Frames menu:

So to access the vector of Estimated values drag the Data - Estimated item into the script window and you'll see Data$Estimated in your R script window.

This is not an R tutorial. To learn more about R, please see the resources section at the end of this article. If you're new to R, I'd recommend that you use the R Console to develop and debug your scripts - you'll get better feedback and it's a little easier to see the intermediate results. Then copy and paste the script into the RCaller. Build your script incrementally in the R Console so it's clear where any issues arise. You can load a sample of your data using the R readers: i.e. :

Data = read.csv("D:/tmp/SampleData.csv")

# **Note** R uses UNIX paths, i.e. '/' not '\'.

... can be tricky! RCaller passes data back to FME via a data frame called 'fmeOutput' . Each row in the data frame will become a separate output feature in FME. If you know how to build and append to R data frames you can probably skip this section.

To populate fmeOutput data frame, you can simply pass back a list of values (vectors of length one), i.e.:

> Data = read.csv("D:/tmp/SampleData.csv") > meanAct = mean(Data$Actual) > meanEst = mean(Data$Estimated) > fmeOutput = data.frame(meanEst, meanAct)

This will result in a single FME feature with the two mean values that have FME attribute names meanX & meanY

But many R functions return more complex results. For example a linear regression function solving for y=mx+k:

lm.linear <- lm(Data$Actual ~ Data$Estimated)

Use the R *summary()* function to see the results:

> summary(lm.linear) Call: lm(formula = Data$Actual ~ Data$Estimated) Residuals: Min 1Q Median 3Q Max -9.9667 -2.1022 0.2679 2.3813 8.3354 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 12.051001 9.149612 1.317 0.211 Data$Estimated -0.009291 0.861531 -0.011 0.992 Residual standard error: 5.045 on 13 degrees of freedom Multiple R-squared: 8.946e-06, Adjusted R-squared: -0.07691 F-statistic: 0.0001163 on 1 and 13 DF, p-value: 0.9916

How to get that back into FME?

The *names()* function will give you back the variable names in the summary, i.e.:

> names(summary(lm.linear)) [1] "call" "terms" "residuals" "coefficients" "aliased" [6] "sigma" "df" "r.squared" "adj.r.squared" "fstatistic" [11] "cov.unscaled"

But... some of these are more complex objects in their own right, like the "coefficients":

> summary(lm.linear)$coefficients Estimate Std. Error t value Pr(>|t|) (Intercept) 12.051001492 9.1496116 1.31710525 0.2105480 Data$Estimated -0.009291111 0.8615308 -0.01078442 0.9915592

So what to do if you want to return to FME the common characteristics of the y=mx+k analysis such as the r squared value, m & k?

You have to pick-out the values you need and pass them to the fmeOutput data frame. In the above example, m is given by the Data$Estimated Estimate = -0.009291111 and k (y intercept) is given by (Intercept) Estimate = 12.051001492 and the r squared result is simple value: r.squared. So you can use:

k <- summary(lm.linear)$coefficients[1,1] (the first column of the first row) m <- summary(lm.linear)$coefficients[2,1] (the second column of the first row) r2 <- summary(lm.linear)$r.squared

That was easy!

The workspace rcallerlinearregression.fmwt illustrates the example described above.

One final tip: expose the result variables in the RCaller to make life easier in workbench:

In some cases it may not be appropriate to use a data frame for your R results, i.e for a large raster or an image. In this case you can export your R results to a temporary data file and have FME re-read those results. The article RCaller: Interpolate Points to Raster Through Kriging illustrates how you can do this.

For many statistical problems, you have a qualitative value, i.e. Codes ABC ABD TXU, that have some bearing on the quantitative values. So simple grouping makes a lot of sense.

For example, you might want to calculate the mean of each Code value:

Date Code Estimated Actual 11/29/2016 TXU 46.14 59.5 11/28/2016 ABD 43.89 34.1 11/27/2016 TXU 42.15 25.8 11/27/2016 ABC 9.3 20.3 11/26/2016 ABD 42.15 50.6 11/25/2016 ABC 11.04 11.7

You can put your analysis in a loop, sample the data by the Code and then calculate the regression. Something like:

for ( currentCode in unique(Data$Code)) { # assuming the input data.frame is 'Data' tmpData = Data[Data$Code == currentCode,] lm = lm(tmpData$Actual ~ tmpData$Estimated) # linear model on y = mx+k }

But you can't just use:

r2 <- summary(lm)$r.squared

to return the result, since you'll just return the last r2 out of the three calculated values.

One approach is to build vectors for each result and then copy these result vectors to the fmeOutput data frame.

# initialize vectors to hold results r2 <- c() m <- c() k <- c() Code <- character() # y = Actual x = Estimated for ( currentCode in unique(Data$Code)) { tmpData = Data[Data$Code == currentCode,] # linear regression for y = mx+k lm.linear = lm(tmpData$Actual ~ tmpData$Estimated) # linear model result vectors y = mx+k r2 = c(r2, summary(lm.linear)$r.squared) k = c(k, summary(lm.linear)$coefficients[1,1]) m = c(m, summary(lm.linear)$coefficients[2,1]) Code = c(Code, currentCode) } fmeOutput<-data.frame(Code, m, k, r2)

You can assign the results directly to a data frame which would be more efficient, if you can figure it out.

The workspace rcallerlinearregressionwithgroups.fmwt illustrates the example described above.

**Debugging your R Script**

If you are relatively new to R, then I would recommend that you first develop your script in the R Console and then transfer to RCaller. It's a lot easier to debug there, see the section Building an R Script above. If you encounter the RCaller error:

ERROR |RCaller(InlineQueryFactory): InlineQueryFactory failed with exit code 1 when executing R script. Output was: Loading required package: gsubfn Loading required package: proto Loading required package: RSQLite

This seems to be a common error response if there is a syntax error in your script, or an undefined variable reference, so carefully check your script for unassigned variables or misspellings.

Remember, like FME, R is case sensitive.

Here are some useful resources around using R in FME:

FME RCaller documentation

'R' tutorials: http://www.r-tutor.com/ and here

Extracting 'summary' information using summary(): example here

Appending to a data frame examples

Knowledge Center RCaller articles:

RCaller: Interpolate Points to Raster Through Kriging

RCaller: Is Tree Height and Tree Width Correlated?

**Continue to RCaller: Is Tree Height and Tree Width Correlated?**

thub.nodes.view.add-new-comment

rcaller.png
(6.6 kB)

rcallertables.png
(19.5 kB)

rscript1.png
(20.6 kB)

rconsole.png
(18.8 kB)

rcalleroutputattributes.png
(3.1 kB)

rcallerlinearregressionwithgroups.fmwt
(15.6 kB)

rcallerlinearregression.fmwt
(13.7 kB)

rcallerlinearregression.fmwt
(13.7 kB)

rcallerlinearregressionwithgroups.fmwt
(15.6 kB)

Tutorial: Getting Started with the RCaller

RCaller: Is Tree Height and Tree Width Correlated?

RCaller: Interpolate Points to Raster Through Kriging

Basic Statistical Custom Transformers

Perform a Shapiro-Wilk Statistical Test using R or Python

FME Workspace Statistics (You Can Help!)

How to Create a Heatmap Using the RCaller

Calculating a Running Total or Cumulative Sum | StatisticsCalculator

© 2019 Safe Software Inc | Legal

- Anonymous
- Sign in
- Create
- New Question
- New Article
- New Idea
- Spaces
- 3D (and BIM)
- Attribute Handling
- CAD
- Cloud
- Coordinate Systems
- Custom Transformers
- Database
- Dynamic Workspaces
- FME Cloud API
- FME Cloud Administration
- FME Cloud Getting Started
- FME Desktop 3rd Party Integrations
- FME Desktop Administration
- FME Desktop Administration & Configuration
- FME Desktop Development
- FME Desktop Getting Started
- FME Desktop Installation
- FME Desktop Licensing
- FME Desktop Plug-In SDK
- FME Desktop Workbench Scripting
- FME Server 3rd Party Integrations
- FME Server Administration
- FME Server Administration & Configuration
- FME Server Development
- FME Server Getting Started
- FME Server Installation
- FME Server Licensing
- Fanouts
- Ideas FME Cloud
- Ideas FME Desktop: Data Inspector
- Ideas FME Desktop: Formats & Systems
- Ideas FME Desktop: Transformers
- Ideas FME Desktop: Workbench
- Ideas FME Server
- KML
- Lists
- Performance Tuning
- Point Cloud
- Published Parameters
- Raster
- Real-Time
- Running Multiple Workspaces
- Tabular
- Troubleshooting Techniques
- Vector / GIS
- Web
- Workflow Design
- XML / GML
- Zip Files
- *FME Desktop
- *FME Server
- *FME Cloud
- *Other
- Explore
- Topics
- Questions
- Articles
- Ideas
- Users
- Badges