ArcGIS GeoEvent Server utilizes Apache Kafka to manage all event traffic from inputs to GeoEvent Services and then again from a GeoEvent Services to outputs. Kafka provides a set of topics (message queues) for events to be published to and for consumers to subscribe to those event messages. The Kafka topic queues are managed on disk for persistent storage and for message queue recovery upon a system failure.
GeoEvent Server Kafka topic basics
Each GeoEvent Server input and output has its own Kafka topic.
Each Kafka topic is broken down into several partitions. Partitions break the events into three separate message queues for parallelism. Each Kafka topic is configured, by default, to create three topic partitions. A subscriber of the topic will spin up several event consumers that run in parallel to improve performance.
Note that Kafka creates and manages a large set of partitions for a consumer offset topic. This large number of partitions is what gives the system such good performance via parallelism.
General disk size recommendations
For a new installation of GeoEvent Server, the ArcGIS GeoEvent Gateway service requires at least 1GB of disk space. Each input or output you add will require a minimum of 360 MB additional disk space before you process any events. Note that all the sizes are minimum estimates and are likely to grow the more elements you configure in GeoEvent Server.
GeoEvent Server Kafka settings
You can modify the behavior of the Kafka instance for GeoEvent Server by editing the Kafka properties file. The primary reason for modifying this property file is to change the location of the files on disk. However, there are rare occasions where the other properties may need to be updated.
The Kafka properties file
The property file that contains the Kafka settings (kafka.properties) for GeoEvent Server can be found in one of the following directories depending on your operating system.
- Windows (default) - C:\Program Files\ArcGIS\server\geoevent\gateway\etc\kafka.properties
- Linux (default) - /home/arcgis/server/GeoEvent/gateway/etc/kafka.properties
The default settings in this file are set to optimize performance at the expense of increased disk usage.
The Kafka topics in GeoEvent Server are stored in one of the following directories depending on your operating system.
- Windows (default) - C:\ProgramData\ESRI\GeoEvent-Gateway\kafka\logs\
- Linux (default) - /home/arcgis/.esri/GeoEvent-Gateway/config.[machine name]/kafka/logs (e.g. /home/arcgis/.esri/GeoEvent-Gateway/config.gesdev01/kafka/logs)
To change the storage location of the Kafka topics, update the following properties depending on your operating system.
Windows default properties:
Linux default properties:
- gateway.data.dir=/home/arcgis/.esri/GeoEvent-Gateway/config.[machine name] (e.g. /home/arcgis/.esri/GeoEvent-Gateway/config.gesdev01)
In GeoEvent Server, the default number of topic partitions is 3. Thus, if you inspect the folder where your topics are stored, you will find three folders with identical names and an index at the end (-1, -2, and -3). Inside each partition folder, Kafka maintains a log of all the data in the topic partition at that moment. To change the number of topic partitions, modify the following property.
Topic partition file sizes
By default, each Kafka topic partition log will start at a minimum size of 20MB and grow to a maximum size of 100MB on disk before a new log file is created. It’s possible to have multiple log files in a partition at any one time. In extreme cases of high-velocity event streams, each topic partition folder can grow to be 3 to 4 times larger than the maximum log file size (up to 300MB to 400MB). For a single topic with three partitions, the total disk space can grow to be 900MB to 1200MB at any given time. Multiply that maximum size by the number of inputs and outputs you have configured and that is the size on disk you need to have available for Kafka in GeoEvent Server. The property below controls the maximum size of the log file before rolling over to a new file (with the default being 100MB).
If you have high-velocity data, you could end up with multiple 100MB log files, if not, you might only have one. For lower velocity event data, the smaller the size you can set this property. For higher velocity event data, the larger you should set this property. If you set the size too small, Kafka will continually be rolling over files. If you set the size too large, Kafka will rarely roll over the log file to a new one, and old events will be kept in the queue longer than necessary.
Another setting that affects how much disk space is consumed by a Kafka topic partition is retention bytes. This property instructs Kafka to always keep a minimum amount of data. By default, the value for this property is 100MB. So even if Kafka decides it can and should delete old data, the size of the remaining data will never be below 100MB. As with the segment bytes property above, if you are working with lower data velocities, you can lower the value for this property. When working with higher velocity data, the default 100MB should be used.
Topic partition file management
As subscribers consume events from a partitions queue, events will become stale when marked as consumed by all subscribers. The amount of time Kafka keeps old messages and the frequency that Kafka cleans out old messages can be set using the following properties.
The default retention policy is 1 hour. Any data files older than 1 hour and not currently storing active data will be deleted. If the file is still being actively used to store data (as might be the case with low volume/velocity data) it will not be deleted. Kafka will check to delete old data files every 30 seconds by default.
Optional topic partition file management properties
The properties below can be added to your Kafka properties file to increase the rate the partition files are rolled over. This may improve disk space utilization, but also might impact performance if set too high.
The first property instructs Kafka to roll a data file over, essentially replace it with a new one, every 30 minutes. If you add this setting, Kafka will create a new data file every 30 minutes regardless of the size of the old data file. For low-velocity data streams, this can prevent having to maintain older data if the data file is not filling up very often. The second property determines how persistent Kafka is about rolling the data file over. The recommended value is 3 minutes, meaning Kafka will check every 3 minutes if it needs to roll the data file over.