Skip To Content

Find Similar Locations

Find Similar Locations The Find Similar Locations tool identifies the candidate features that are most similar or dissimilar to one or more input features based on feature attributes.

Workflow diagram

Find Similar Locations workflow diagram

Analysis using GeoAnalytics Tools

Analysis using GeoAnalytics Tools is run using distributed processing across multiple ArcGIS GeoAnalytics Server machines and cores. GeoAnalytics Tools and standard feature analysis tools in ArcGIS Enterprise have different parameters and capabilities. To learn more about these differences, see Feature analysis tool differences.

Examples

  • Determine which of your stores are most similar to your top performers with regard to customer profiles.

  • Based on characteristics of the villages hit hardest by a disease, which other villages are high risk?

  • A town's after-school fitness program was extremely successful. Promoters want to find other towns with similar characteristics to serve as candidates for program expansion.

  • A crime analyst wants to search the database of all crimes to see if a recent crime may be part of a larger pattern or trend.

  • A human resources manager wants to justify company salary ranges. Once she identifies cities that are similar in terms of size, cost of living, and amenities, she will examine salary ranges for positions of interest and determine if they are in line with company salaries.

Usage notes

Tabular, point, line, or area features can be used.

The reference can be made using all of the features in the input layer or by making a selection. A selection can be made interactively using the Select button Select or by a filter using the Query button Query. Multiple features can be selected or unselected using the Select button. Only one query can be used to make a selection on the reference layer.

An input candidate layer is required. The features in the candidate layer will be ranked by similarity to the reference locations.

Ranked similarity is based on the fields specified in the Base similarity on parameter. More than one field can be specified. Only numeric fields with names matching the reference layer can be selected. Features with the lowest rank number have the highest similarity to the reference layer.

By default, all of the features, up to a maximum of 10,000. in the candidate locations layer will be ranked from most to least similar. The Show me parameter can be used to specify the number of features you want returned.

The Determine the most and least similar using parameter allows you to specify how features are matched. You can select field values or field profiles.

  • For field values, the most similar candidates will have the smallest sum of squared differences for all the features you use Base similarity on; all values are standardized before differences are calculated.
  • For field profiles, the cosine similarity is measured. Cosine similarity looks for the same relationships among standardized attribute values rather than trying to match magnitudes. Suppose there are three fields you will use the Base similarity on parameter on called A1, A2, and A3. A2 is twice as large as A1, and A3 is almost equal to A2. For field profiles, the tool will look for candidates with those same attribute relationships: A2 twice as large as A1; then almost equal. Because this method is looking at relationships between attributes, you must specify a minimum of two fields for the Base similarity on parameter. You might use the cosine similarity method (field profiles) to find places such as Los Angeles, but at a smaller scale overall. For example, where you are interested in the profile of population and number of cars to number of residents less than 20. The cosine similarity index ranges from 1.0 (perfect similarity) to -1.0 (perfect dissimilarity). The cosine similarity index is written to the Output Features simindex (Cosine similarity) field.

All of the fields used for matching are written to the output. Choose fields to add to results allows you to specify fields to add to the output table if desired. By default, all fields are added.

In addition to the matching fields and the fields to add to the results, the following fields are included in the output:

Field nameDescriptionNotes

location_type

A string indicating whether features are from the reference layer or the search layer.

This field is always included in the output.

simrank

When you select most similar locations or most and least similar, all of the solution matches are ranked from most similar to least similar. The most similar solution match has a rank value of 1.

This field is only included when you select most similar or most and least.

dissimrank

When you select least similar locations or most and least similar, all of the solution matches are ranked from least similar to most similar. The solution that is least similar has a rank value of 1.

This field is only included when you select least similar or most and least.

simindex

This field quantifies how similar each solution match is to the reference feature. When you specify field values, this value represents the sum of squared value differences.

This field is only included when you select field values.

cosimindex

This field quantifies how similar each solution match is to the reference feature. When you specify field profiles, this value represents the cosine similarity.

This field is only included when you select field profiles.

labelrank

This field is used for display purposes only. The tool uses this field to provide default rendering of the analysis results.

This field is always included in the output.

reference_id

A unique ID value for reference features. Search features are given a null value.

This value was introduced at ArcGIS Enterprise 10.6.1.

search_id

A unique ID value for search value features. Reference features are given a null value.

This value was introduced at ArcGIS Enterprise 10.6.1.

At ArcGIS Enterprise 10.6.1, a summary of similarity calculations is available on the item details page. The summary includes the following:

  • Summary of Input Features—A statistical summary of one or more features used as reference features. If more than one feature is used, this is the averaged value. Each field used in the calculation is represented as a row.
  • Summary of Attributes of Interest—A statistical summary of the search features. Each field used in the calculation is represented as a row.
  • The third table represents the search features that were most closely matched. This table will show a maximum of 50 features, although more features may have been matched. The table shows the search_id, simrank, and simindex values outlined in the above table.

Limitations

  • The reference layer and candidate layer must have at least one numeric field with a matching name.
  • When using the field profiles method, the reference layer and candidate layer must have at least two numeric fields with a matching name.

How Find Similar Locations works

To use Find Similar Locations, you provide the reference locations, the candidate search locations, and the fields representing the criteria you want to match. The layer you select for analysis should contain your reference or benchmark locations. For example, your reference locations might be a layer containing your top performing stores or the villages hardest hit by a disease. You then specify the layer containing your candidate search locations. This might be all of your stores or all other villages. Finally, you identify one or more fields to use for measuring similarity. Find Similar Locations will then rank all of the candidate search locations by how closely they match your reference locations across all of the fields you selected.

In some cases, your analysis layer will contain both the reference locations and the candidate search locations. You may have a single layer containing all of your stores, for example, and want to rank them from most to least similar to your top performing store. Use your stores layer as both your analysis layer and your candidate search layer. You must then identify, using one of the selection tools, which store is your top performer. You can select your reference locations using interactive query or by building a query. Alternatively, create a copy of the stores layer so there are two versions in the table of contents. Click the filter button under the first copy and define a filter to select your top performer. Then click the filter button under the second layer and define a filter to select the candidate search locations (which may be all of the stores except your top performer). The first layer is your analysis layer (click Perform Analysis below the layer or the Analysis button at the top of your map, and navigate to Find Similar Locations by expanding the Find Locations category). Specify the second layer for the Search for similar locations in parameter. These are your candidate search locations.

In other cases, you will have separate reference and candidate search layers. You may have a stores layer that includes your top performer with fields describing the store customer base (fields such as median income and marital status for example) and a second layer of candidate parcels from which you will determine the best location to build a new store. In this case, if the reference locations layer includes more than your reference locations, you must first identify the reference locations using one of the selection tools described above. If your layer only includes your reference locations (your top performing store for example), you do not need to make a selection. You would specify your parcels layer for the candidate search locations (parameter two). If both the parcels and your top performing store have fields describing the customer base, you can run Find Similar Locations to identify candidate parcels with demographic characteristics most like the customers for your best performing store.

If there is more than one reference location, similarity will be based on averages for the fields you specify. For example, if there are two reference locations and you are interested in matching population, the tool will look for candidate search locations with populations that are most like the average population for both reference locations. If the values for the reference locations are 100 and 102, for example, the tool will look for candidate search locations with populations near 101. Consequently, you will want to select fields for the reference locations fields that have similar values. If, for example, the population values for one reference location is 100 and the other is 100,000, the tool will look for candidate search locations with population values near the average of those two values: 50,050. Notice that this averaged value is not close to the population for either of the reference locations.

ArcGIS API for Python example

The Find Similar Locations tool is available through ArcGIS API for Python.

This example finds potential retail locations based on the current top locations and their attributes.

# Import the required ArcGIS API for Python modules
import arcgis
from arcgis.gis import GIS
from arcgis.geoanalytics import find_locations

# Connect to your ArcGIS Enterprise portal and confirm that GeoAnalytics is supported
portal = GIS("https://myportal.domain.com/portal", "gis_publisher", "my_password", verify_cert=False)
if not portal.geoanalytics.is_supported():
    print("Quitting, GeoAnalytics is not supported")
    exit(1)   

# Find the feature layer containing the stores and filter to obtain stores in the top percentile
stores = portal.content.search("Stores", "Feature Layer")
stores_layer = stores_results[0].layers[0]
stores_layer.filter = "top_percentile = 'true'"

# Find the feature layer you'll use to search for similar locations
search_results = portal.content.search("PotentialLocations", "Feature Layer")
locations = search_results[0].layers[0]

# Run the Find Similar Locations tool
similar_location_result = find_locations.find_similar_locations(input_layer = stores_layer,
                                                                search_layer = locations,
                                                                analysis_fields = "median_income, population, nearest_competitor",
                                                                most_or_least_similar = "MostSimilar", 
                                                                match_method = "AttributeValues", 
                                                                number_of_results = 50, 
                                                                output_name = "similar_locations")

# Visualize the tool results if you are running Python in a Jupyter Notebook
processed_map = portal.map('Europe')
processed_map.add_layer(similar_location_result)
processed_map

Similar tools

Use Find Similar Locations to measure the similarity of locations in a candidate and reference layer. Other tools may be useful in solving similar but slightly different problems.

Map Viewer analysis tools

To find similar locations using the standard analysis tools, see Find Similar Locations.

If you are trying to select existing locations with a query, use the standard Find Existing Locations tool.

If you are trying to use a query to create new features, use the standard Derive New Locations tool.

ArcGIS Desktop analysis tools

The GeoAnalytics Tools Find Similar Locations is available in ArcGIS Pro.