Questions in topic: "k-means cluster analysis fme python pythoncaller pointclustering"
https://knowledge.safe.com/questions/topics/single/59940.html
The latest questions for the topic "k-means cluster analysis fme python pythoncaller pointclustering"Clustering Points in FME (Cluster Size)
https://knowledge.safe.com/questions/73841/clustering-points-in-fme-cluster-size.html
<p>Hi All,</p><p>I have a total number of <em>n</em> (points) spatially divided over a given area. I want to group them into <em>x</em> clusters using <em>y</em> members per cluster. This has to be as spatially optimal as possible.</p><p>So I would like set the following parameters in the transformer:</p><p>- The number of clusters</p><p>- The size of the clusters or the min. and max. size of cluster</p><p>Anyone got any ideas? I already tried working with the cluster modeller, K-means & RClusterCalculatowith no success. </p><p>Thanks in advance!</p><p></p>cluster analysisclustermodellerclustersk-means cluster analysis fme python pythoncaller pointclusteringThu, 05 Jul 2018 18:21:26 GMTbishoymfK-Means point clustering using FME
https://knowledge.safe.com/idea/59941/k-means-point-clustering-using-fme.html
<p></p>
<div>
<h1>Clustering Points With
K-Means in FME</h1>
<p><strong>This
article has been generated using FME 2017.1</strong></p>
<p>For a
project to assess the quality of suppliers, five suppliers were approached to
perform comparisons on aerial imagery. They were asked to select areas from
aerial photographs where potentially dangerous situations might arise as a
result of human activity. After the first delivery of reports from the
suppliers within a test area, one of the obstacles for the project appeared to
be the identification of clusters of reports. A cluster of such reports
represent an area where a potentially dangerous situation has come about. A
criterion for selecting the best supplier to detect the situations from the
imagery, could be as follows: the best supplier is the one that “touches” the
maximum number of clusters with the reports generated, with as few
notifications as possible. Because FME played an important role in this
project, but no transformer seemed available within FME, a study was made to implement
a so-called K-Means clustering algorithm in an FME transformer.</p>
<h2>Background article</h2>
<p>The result
of a K-Means clustering is a grouping of points in clusters, where the sum of
the distances to the centroid of the cluster for all points within it is
minimal, and smaller than the distance to other centroids. As a starting point
for the clustering analysis is, apart from a list of points, the number of
clusters to be generated. A nice article on this subject can be found here:</p>
<p><a href="https://datasciencelab.wordpress.com/2013/12/12/clustering-with-k-means-in-python/">https://datasciencelab.wordpress.com/2013/12/12/clustering-with-k-means-in-python/</a></p>
<p>The article
also contains a detailed Python script that might be implemented in FME. In the
following, therefore, a description of that implementation is given. For further
information on the Python algorithm or on the subject of K-means clustering, please
refer to the article above.</p>
<h2>Outline</h2>
<p>The
solution in FME is explained in the following steps:</p>
<p>1. Import and control of source dataset</p>
<p>2. Collect input coordinates and call the
Python-script</p>
<p>3. Collect the resulting clusters</p>
<p>A so-called
FME Custom Transformer, the PointClusterer, was built
for the solution. Below is an explanation of the steps taken.</p>
<h4>Import and control of source dataset</h4>
<p>The custom
transformer has only one parameter, the number of clusters to generate. During experiments,
the algorithm appeared to stumble when e.g. entering a negative number for the
number of clusters to be generated, or entering a value greater than the total
number of points in the source data. So, introducing a user parameter for the
number of clusters to generate (type number, positive, and with 0 decimals),
the input parameter is forced to an integer. In that way the user is not allowed
to enter invalid values. A separate check (using a Tester) is built in, to ensure
the number of clusters to be generated does not exceed the number of points
entered.</p>
<h4>Collect input coordinates and call the
Python-script</h4>
<p>After the points
are entered, they are checked for geometry: only point geometry is allowed.
From the point information, the coordinates are extracted and placed in a list of
X, Y using an Aggregator. The number of points in the input is counted and
passed on using an attribute value to a PythonCaller,
in which the actual call to the clustering algorithm takes place. The same way,
the PythonCaller reads the number of clusters to be
generated also, originating from the user parameter. After that, a numpy array is filled with the coordinates that are in the
list X, Y (using feature.getAttribute).</p>
<h4>Collect the resulting clusters</h4>
<p>The call to
the find_centers function in the python script provides the
convergence criterion and the list of clusters that were generated (numpy array format). To get the clusternumber
along with the coordinates, the list is iterated and all coordinates with their
assigned cluster number are placed in a new list attribute on the FME feature (feature.setAttribute). Once outside of the PythonCaller, the list created is split into their
originating parts, getting back the original coordinates with the assigned
cluster number. The coordinate values are used to build the point geometries again
(using a VertexCreator) and the coordinates are put
together in an aggregate by means of their cluster number (using an Aggregator).</p>
<h2>Results</h2>
<p>The custom transformer
which has been created, can be downloaded from the FME Hub, <a href="https://hub.safe.com/transformers/pointclusterer">https://hub.safe.com/transformers/pointclusterer</a>. It has been used on two small
datasets, yielding the following results:</p>
<p>-
A
regular grid of 10x10 points and 7 clusters [fig. 1], total duration for the
translations was 1 second.</p>
<p>-
A
series of approx. 300 Dutch municipalities grouped into 7 clusters [fig. 2]
took 4 seconds processing time (which included about 3 seconds to read the data
from a PostGIS reader and replacing the polygon
features by an inside point).</p>
<p>For the
sake of clarity, the points here are buffered and provided with a color.</p>
<p>When larger
numbers (about 3000 points, 60 clusters) are used, the algorithm takes about 30
seconds to arrive at a result.</p><p><img src="/storage/attachments/13528-regular-10x10.png" style="width: 294px;"><img src="https://knowledge.safe.com/storage/attachments/13529-dutchmunicipalities.png" style="background-color: initial; width: 262px;"></p><p>Fig. 1: 7 clusters from a regular 10x10 grid Fig. 2: 7 clusters from approx..300 Dutch
municipalities</p>
<p> </p>
<p>The last example might be an idea for a regional
reclassification in the Netherlands?</p>
<h2>Conclusions /
Recommendations</h2>
<p>It turns
out to be quite easy to incorporate a piece of Python functionality in FME. The
created transformer can easily be shared with other interested parties via the
FME Hub platform. The transformer that was created could be extended with other
clustering methods, or adapted in such a way that other objects than just
points can be clustered together. The transformer was not used in the project
mentioned in the introduction, it was merely a source of inspiration. When used
in the project, the transformer could be adapted by choosing an algorithm that
derives the optimal number of clusters from the data itself. See for example <a href="https://beta.vu.nl/nl/Images/werkstuk-hendrikson_tcm235-91358.pdf">https://beta.vu.nl/nl/Images/werkstuk-hendrikson_tcm235-91358.pdf</a></p>
<p> </p>
</div>k-means cluster analysis fme python pythoncaller pointclusteringWed, 13 Dec 2017 14:32:30 GMThelmoet