Skip To Content

Understanding a big data file share manifest

Big data file shares are registered as a data store through ArcGIS Server Manager. They require a manifest to outline the schema of the data, as well as the fields that represent geometry and time in the dataset. The manifest is automatically generated when you register a big data file share, but you may need to make modifications if there are any changes to your data, or if the manifest generation was unable to determine all the information needed (for example, if the automatically-generated manifest did not select the correct field for the geometry or time).

Note:

Editing your big data file share is an advanced option. To learn more about applying changes to individual datasets in your manifest, see Edit big data file share manifests in Manager. To learn about applying a hints file for delimited files, see Understanding the hints file.

The manifest is composed of datasets. The number of datasets depends on the number of folders your big data file share contains. In the following example, there are five datasets:

"datasets":[
  {.. dataset1 ..},
  {.. dataset2 ..},
  {.. dataset3 ..},
  {.. dataset4 ..},
  {.. dataset5 ..},
]

Within each dataset, there are five top-level objects that may be applicable. Of these objects, name, format, and schema are required.

{
 "name": "dataset1",
 "format": {},
 "schema": {},
 "geometry": {},
 "time": {}
}

Name

The name object is required and defines the name of the dataset. This must be unique within the manifest.

Format

The format object is required and defines the dataset type and its format.

SyntaxExample
"format" : {
 "type" :  "< delimited | shapefile | orc | parquet >",
 "extension" : "< csv | tsv | shp | orc | parquet >",
 "fieldDelimiter" : "< delimiter >",
 "recordTerminator: "< terminator >",
 "quoteChar":  "< character for quotes>",
 "hasHeaderRow" :  < true | false >, 
 "encoding" : "< encoding format >"
}

Example using a shapefile:

"format" : {
 "type": "shapefile",
 "extension": "shp"
}

Example using a delimited file:

"format" : {
 "type": "delimited",
 "extension": "csv",
 "fieldDelimiter": ",",
 "recordTerminator": "\n", 
 "quoteChar" "\"",
 "hasHeaderRow": true,
 "encoding" : "UTF-8"
}

Description

  • type—A required property that defines the source data. This can either be delimited, shapefile, parquet or orc.
  • extension—A required property denoting the file extension. For shapefiles, this is shp, delimited files use the file extension of the data (for example, csv or tsv), ORC files use orc, and parquet files use parquet.
  • fieldDelimiter—This is required when type is delimited. This field represents what separates fields in the delimited file.
  • recordTerminator—This is only required when type is delimited. This field specifies what terminates features in the delimited file.
  • quoteChar—This is only required when type is delimited. The character denotes how quotes are specified in the delimited file.
  • hasHeaderRow—This is only required when type is delimited. This property specifies if the first row in a delimited file should be treated as a header or as the first feature.
  • encoding—This is only required when type is delimited. This property specifies the type of encoding used.

Schema

The schema object is required; it defines the dataset fields and field type.

SyntaxExample
"schema" : {
 "fields" : {
  "name": "< fieldName >",
  "type" : "< esriFieldTypeString | 
     esriFieldTypeBigInteger | 
     esriFieldTypeDouble >"
 }
}
"schema" : {
 "fields":[
  {
   "name": "trackid",
   "type": "esriFieldTypeString"
  },
  {
   "name": "x",
   "type": "esriFieldTypeDouble"
  },
  {
   "name": "y",
   "type": "esriFieldTypeDouble"
  },
  {
   "name": "time",
   "type": "esriFieldTypeBigInteger"
  },
  {
   "name": "value",
   "type": "esriFieldTypeBigInteger"
  }
 ]
}

Description

  • fields—A required property that defines the fields in the schema.
  • name—A required property denoting the field name. The field name must be unique to the dataset, and it can only contain alphanumeric characters and underscores.
  • type—This is a required property that defines the type of the field. Options include the following:
    • esriFieldTypeString—For strings
    • esriFieldTypeDouble—For doubles or floats
    • esriFieldTypeBigInteger—For integers
    • esriFieldTypeDate—For shapefiles with date fields. Delimited, ORC, and parquet datasets with fields representing a date must have dates represented by a esriFieldTypeString field.

Geometry

The geometry object is optional. It's required if a dataset has a spatial representation, such as a point, polyline, or polygon.

SyntaxExample
"geometry" : {
 "geometryType" : "< esriGeometryType >",
 "spatialReference" : {
  "wkid": <wkidNum>,
  "latestwkid" : <latestWkidNum>
  },
 "fields": [
 {
  "name": "<fieldName1>",
  "formats": ["<fieldFormat1>"]
 },
 {
  "name": "<fieldName2>",
  "formats": ["<fieldFormat2>"]
 }
 ]
}

Example using a delimited file with x and y values:

"geometry" : {
 "geometryType" : "esriGeometryPoint",
 "spatialReference" : {
  "wkid": 3857
 },
 "fields": [
 {
  "name": "XValue",
  "formats": ["x"]
 },
 {
  "name": "YValue",
  "formats": ["y"]
 }
 ]
}

Example using a delimited file with x, y, and z values:

"geometry" : {
 "geometryType" : "esriGeometryPoint",
 "spatialReference" : {
  "wkid": 4326
 },
 "fields": [
 {
  "name": "Longitude",
  "formats": ["x"]
 },
 {
  "name": "Latitude",
  "formats": ["y"]
 },
 {
  "name": "Height",
  "formats": ["z"]
 }
 ]
}

Example using a .tsv file:

"geometry" : {
 "geometryType" : "esriGeometryPolygon",
 "dropSourceFields": true,
 "spatialReference" : {
  "wkid": 3857
 },
 "fields": [
 {
  "name": "Shapelocation",
  "formats": ["WKT"]
 }
 ]
}

Description

Note:

Since the geometry object is optional, the following properties are listed as required or optional, assuming that a geometry is used:

  • geometryType—This is required. Options include the following:
    • esriGeometryPoint
    • esriGeometryPolyline
    • esriGeometryPolygon
  • spatialReference—A required property denoting the spatial reference of the dataset.
    • wkid—A field that denotes the spatial reference, where wkid or latestWkid is required for a dataset with a geometry.
    • latestWkid—A field that denotes the spatial reference at a given software release, where wkid or latestWkid is required for a dataset with geometry.
  • fields—A required property for delimited datasets with a spatial representation. This denotes the field name or names and formats of the geometry.
    • name—A required property for delimited datasets with a spatial representation. This denotes the name of the field used to represent the geometry. There can be multiple instances of this.
    • formats—A required property for delimited datasets with a spatial representation. This denotes the format of the field used to represent the geometry. There can be multiple instances of this.
  • dropSourceFields—An optional property for datasets with fields representing the geometry. This denotes if the fields used to specify the geometry will be used as fields in analysis. If set to true, the fields used for geometry will not be visible as analysis fields (like summary statistics) and dropped when running tools. The default is false. This property can not be set on shapefile datasets.

Time

The time object is optional. It is required if a dataset has a temporal representation.

SyntaxExample
"time" : {
 "timeType" : "< instant | interval >",
 "timeReference" : {
  "timeZone" : "<timeZone >"
  },
  "fields": [
  {
   "name": "<fieldName1>",
   "formats": ["<fieldFormat1>"]
   "role": "< start | end >"
  }
 ]
}

Example using an instant with multiple formats in the time fields:

"time": {
 "timeType": "instant",
 "timeReference": {"timeZone": "UTC"},
 "fields": [
 {
  "name": "iso_time",
  "formats": [
   "yyyy-MM-dd HH:mm:ss",
   "MM/dd/yyyy HH:mm"
   ]
  }
 ]
}

Example using an interval, with multiple fields used for startTime:

"time": {
 "timeType": "interval",
 "timeReference": {"timeZone": "-0900"},
 "dropSourceFields" : true,
 "fields": [
 {
  "name": "time_start",
  "formats": ["HH:mm:ss"],
  "role" : "start"
  },
 {
  "name": "date_start",
  "formats": ["yyyy-MM-dd"],
  "role" : "start"
  },
 {
  "name": "datetime_ending",
  "formats": ["yyyy-MM-dd HH:mm:ss"],
  "role" : "end"
  }
 ]
}

Description

Note:

Since the time object is optional, the following properties are listed as required or optional, assuming that time is used:

  • timeType—The time type is required if there is time included in the dataset. Options include the following:
    • instant—For a single moment in time
    • interval—For a time interval represented by a start and stop time
  • timeReference—A required field if the dataset is time-enabled, denoting the time zone (timeZone).
    • timeZone—A required field of timeReference that denotes the time zone format of the data. Time zones are based on Joda-Time. To learn about Joda-Time formats, see Joda-Time Available Time Zones. timeZone can be formatted as follows:
      • Using the full name of the time zone: Pacific Standard Time.
      • Using the time zone offset expressed in hours: -0100 or -01:00.
      • You may use time zone abbreviations for UTC or GMT only; otherwise, use the full name or the hours offset.
  • fields—A required field to denote the field names and formats of the time. Required properties of fields are as follows:
    • name—A required field that denotes the name of the field used to represent time. There may be multiple instances of this object.
    • formats—A required field that denotes the format of the field used to represent the time. There may be multiple formats for a single field (as shown above). There may be multiple instances of this object. To learn how fields may be formatted, see Time formats in a big data file share manifest.
    • role—A required field when timeType is interval. It can represent either the startTime or endTime of a time interval.
  • dropSourceFields—An optional property for datasets with fields representing the time. This denotes if the fields used to specify the time will be used as fields in analysis. If set to true, the fields used for time will not be visible as analysis fields (like summary statistics) and dropped when running tools. The default is false.