The Find Point Clusters tool finds clusters of point features in surrounding noise based on their spatial or spatiotemporal distribution.
Analysis using GeoAnalytics Tools
Analysis using GeoAnalytics Tools is run using distributed processing across multiple ArcGIS GeoAnalytics Server machines and cores. GeoAnalytics Tools and standard feature analysis tools in ArcGIS Enterprise have different parameters and capabilities. To learn more about these differences, see Feature analysis tool differences.
A nongovernmental organization is studying a particular pest-borne disease and has a point dataset representing households in a study area, some of which are infested, some of which are not. Using the Find Point Clusters tool, an analyst can determine clusters of infested households to help pinpoint an area to begin treatment and extermination of pests.
The input for Find Point Clusters is a single point layer.
The Choose the clustering method you want to use parameter determines whether a defined distance or self-adjusting clustering algorithm will be used. Defined distance (DBSCAN) finds clusters of points that are in close proximity based on a specified search range. Self-adjusting (HDBSCAN) finds clusters of points similar to DBSCAN but uses varying search ranges allowing for clusters with varying densities based on cluster probability (or stability).
If DBSCAN is chosen, clusters can be found in either two-dimensional space only or in both space and time. If you select Use time to find clusters and the input layer has time enabled and is of type instant, DBSCAN will discover spatiotemporal clusters of points that are in close proximity based on a specified search distance and search duration.
HDBSCAN currently only supports spatial clustering and will not use time to discover clusters.
All results will include a field named CLUSTER_ID that indicates which cluster each feature belongs to, and a field named COLOR_ID that is a label used for drawing the results so that each cluster is visually distinct from its neighboring clusters in most cases. For both fields, a value of -1 indicates that a feature has been labeled as noise.
If the Defined distance (DBSCAN) clustering method is used with time to discover spatiotemporal clusters, results will also include the following fields:
- FEAT_TIME—The original instant time of each feature.
- START_DATETIME—The start time of the time extent of the cluster a feature belongs to.
- END_DATETIME—The end time of the time extent of the cluster a feature belongs to.
The result layer's time will be set as an interval on the START_DATETIME and END_DATETIME fields, ensuring that in most cases all cluster members are drawn together when visualizing spatiotemporal clusters with a time slider. For noise features, START_DATETIME and END_DATETIME will be equal to FEAT_TIME.
If the Self-adjusting (HDBSCAN) clustering method is used, results will also include the following fields:
- PROB—The probability that a feature belongs in its assigned cluster.
- OUTLIER—The likelihood that a feature is an outlier within its own cluster. A larger value indicates that the feature is more likely to be an outlier.
- EXEMPLAR—Indicates which features are most representative of each cluster. These features are indicated by a value of 1.
- STABILITY—The persistence of each cluster across a range of scales. A larger score indicates that a cluster persists over a wider range of distance scales.
The Minimum number of points to be considered a cluster parameter is used differently depending on the clustering method chosen:
- Defined distance (DBSCAN)—Specifies the number of features that must be found within a search range of a point for that point to start forming a cluster. The results may include clusters with fewer features than this value. The search range distance is set using the Limit the search range to parameter. When using time to find clusters, an additional search duration is required and is set using the Limit the search duration to parameter. When searching for cluster members, the specified minimum number of features must be found within the specified search range and search duration to form a cluster. Note that this distance and duration are not related to the diameter or time extent of the point clusters discovered.
- Self-adjusting (HDBSCAN)—Specifies the number of features neighboring each point (including the point itself) that will be considered when estimating density. This number is also the minimum cluster size allowed when extracting clusters.
When using the HDBSCAN algorithm with an input layer containing more than 3 million features, the tool may fail unless your administrator increases the value of the javaHeapSize parameter on the GeoAnalyticsTools GP Service. Roughly 2 GB of heap space is needed per 3 million features. The amount of RAM specified by javaHeapSize should be available on each GeoAnalytics Server machine in addition to the 16 GB normally required by GeoAnalytics Server. For example, if you want to cluster 9 million features with HDBSCAN, you should set javaHeapSize to no less than 6144 MB, or 6 GB. In this case, each GeoAnalytics Server machine should have a total of at least 22 GB of RAM available.
To learn more, see the ArcGIS Pro documentation on How Density-based Clustering works
ArcGIS API for Python example
The Find Point Clusters tool is available through ArcGIS API for Python.
This example finds clusters of retail locations.
# Import the required ArcGIS API for Python modules
from arcgis.gis import GIS
from arcgis.geoanalytics import analyze_patterns
# Connect to your ArcGIS Enterprise portal and confirm that GeoAnalytics is supported
portal = GIS("https://myportal.domain.com/portal", "gis_publisher", "my_password", verify_cert=False)
if not portal.geoanalytics.is_supported():
print("Quitting, GeoAnalytics is not supported")
# Find the big data file share dataset you'll use for analysis
search_result = portal.content.search("", "Big Data File Share")
# Look through the search results for a big data file share with the matching name
bd_file = next(x for x in search_result if x.title == "bigDataFileShares_RetailLocation")
# Look through the big data file share for points of sale
pos = next(x for x in bd_file.layers if x.properties.name == "POS")
# Set the tool environment settings
arcgis.env.verbose = True
# Run the Find Point Clusters tool
output = analyze_patterns.find_point_clusters(pos, 10, "Kilometers", "POS_Clusters")
# Visualize the tool results if you are running Python in a Jupyter Notebook
processed_map = portal.map('USA')
This example finds clusters of retail locations.
Use Find Point Clusters to find clusters of point features in surrounding noise based on their spatial distribution. Other tools that may be useful are the following:
Map Viewer analysis tools
To determine if there is any statistically significant clustering in the spatial pattern of your data, use the Find Hot Spots tool.
To create a density map of your point or line features, use the Calculate Density tool.
To determine if there are any statistically significant outliers in the spatial pattern of your data, use the Find Outliers tool.
ArcGIS Desktop analysis tools
The Density-based Clustering geoprocessing tool performs the same function as Find Point Clusters.
The Find Point Clusters GeoAnalytics Tools is available in ArcGIS Pro.