Framework Metrics

AxonIQ Console gathers metrics on your Axon Framework applications. These metrics provide information about the performance of your applications and potential problems.

Measurements: Timers

All metrics that gather times are measured in milliseconds. AxonIQ Console measures the minimum, median, ninetieth percentile, and maximum values for these metrics. When looking at these values, it’s important to look at the ninetieth percentile, as this value is the value that 90% of the measurements are below. This gives you a good indication of the performance of your application. You can see an example below.

Example of a timer graph in AxonIQ Console

Individual measurements of timers count toward their percentiles for 5 minutes, after which they expire and no longer count toward the percentiles. This means that the percentiles are always based on the last 5 minutes of data.

Measurements: Rates

All metrics that gather rates are measured in counts per minute. Only the items counted in the last 10 seconds are counted toward the rate. This means that the rate is always based on the last 10 seconds of data, multiplied by 6 to get a rate per minute. You can see an example below.

Example of a rate graph in AxonIQ Console

Event processors metrics

Event processors metrics provide information about the status of your event processors. You can see an example below.

Example of a event processor graph in AxonIQ Console

The Segment claim Percentage shows the percentage of segments claimed by the event processor. This should, under normal situations, always be 100%.

If the segment claim percentage is lower than 100%, it means that some segments are not claimed by the event processor and a part of your event processing is not happening.

However, if it’s higher than 100%, this means you either have an in-memory token store configured (which can be a valid use-case to process all events on all application instances), or your applications are stealing tokens from each other because the work in a batch takes longer than the configured claim timeout of the token store.

The Ingest latency and Commit latency metrics indicate the amount of time that passes between the publishing of an event and the processing of that event by the event processor. The ingest latency is the time between the publication of the event and the moment it is available for processing. The commit latency is the time between the moment the event is processed and the moment the event processor commits the processing of the event.

The Nodes graph shows the number of nodes currently online that are reporting to have this event processor in their configuration.

Handler metrics

Handler metrics provide information about the performance of your handlers. You can see an example below.

Example of a handler graph in AxonIQ Console

The Overview graph shows a breakdown of the time spent on processing the messages by the handler. The time not accounted for is "Overhead," which is time spent outside specific measurements. This can be due to Garbage Collection, I/O, or other system activities.

The Message Rate shows the number of messages processed by the aggregate per second. It also shows the number of failed messages per second.

The Total Time shows the total time spent processing the message by the handler. This metric can vary based on the type of handler. For example, for aggregates it includes the loading of the aggregate from the event store, the processing of the command, and the committing of the events to the event store.

The Handler Time shows the time spent in your handler code.

Aggregate metrics

Aggregate metrics provide information about the performance of your aggregates. As aggregates handle messages, they have the same metrics as Handlers. In addition, they have several special metrics. You can see an example below.

Example of an aggregate graph in AxonIQ Console

The Lock Time shows the time spent acquiring the lock on the aggregate. This is the time spent waiting for the lock to be available. As aggregates can only handle one command at a time, it is essential to keep this time as low as possible. High values here can indicate a slow event store, long-running actions in the aggregate, or a high contention on the aggregate.

The Load Time shows the time spent loading the aggregate from the event store. This is the time spent reading the events from the event store and applying them to the aggregate. This time includes the time spent acquiring the lock on the aggregate as well.

The Event Store Commit Time shows the time spent committing the events to the event store. This is the time spent writing the events to the event store. High values here can indicate a slow event store.

The EventStream Size shows the number of events that need to be read from the event store to load the aggregate. As the stream can grow over time, it’s important to consider enabling snapshots to reduce the number of events that need to be read. Generally, we recommend keeping this number below 250 events.

Applications metrics

Application metrics provide information about the performance of your application. This includes information about the memory usage, garbage collection, and threads.

Example of application graphs in AxonIQ Console

The Process CPU Usage shows the percentage of CPU used by the JVM process. Regardless of the number of cores, this number will always be between 0 and 100%.

The System CPU Usage shows the percentage of CPU used by the system. This includes all processes running on the system.

The Heap Usage shows the memory usage of the JVM. The JVM has a heap where all objects are stored. This graph shows the usage of this heap. A rising percentage without a drop can indicate a memory leak. A high percentage can indicate that the JVM is running out of memory. JVMs commit memory as needed, allowing three 8 GB Java applications on an 8 GB system. As such, the reported free memory might not be available once needed. Make sure to monitor the memory usage of the system as well, or you might run into issues.

The CommandBus Capacity shows the percentage of the current CommandBus-threads being used over time. This can be used to determine if the CommandBus is able to keep up with the incoming commands. If this number is consistently high, you may need to increase the number of threads in the CommandBus. Note that this only applies to the AxonServerCommandBus.

The QueryBus Capacity shows the percentage of the current QueryBus-threads being used over time. This can be used to determine if the QueryBus is able to keep up with the incoming queries. If this number is consistently high, you may need to increase the number of threads in the QueryBus. Note that this only applies to the AxonServerQueryBus.

The Live Thread Count shows the number of live threads in the JVM. This includes all threads: the main thread, the garbage collector threads, and the threads used by the application. A high number of threads can indicate a problem in the application.

The System Load shows the system load of the system. This is the number of processes that are waiting for CPU time. This number is divided by the number of cores in the system. A system load of 1 means that there is one process waiting for CPU time. A system load of 2 means that there are two processes waiting for CPU time. A system load of 8 on an 8-core system means that all cores are busy.

The Nodes graph shows the number of nodes currently online.

Correlation

AxonIQ Console always tries to show you the most relevant information. When you view the metrics of the handler of an event processor, we will also show the metrics of that event processor itself, and the application. This way you can easily correlate problems in your application.

Example of combined metrics in AxonIQ Console

The AxonIQ Console will do this as well in the following situations:

When viewing the metrics of an event processor, we will show the metrics of the application and the event processor.
When viewing the metrics of an aggregate, we will show the metrics of the application and the aggregate.
When viewing the metrics of a handler, we will show the metrics of the application and the handler, as well as aggregate metrics if the handler is part of an aggregate, or event processor metrics if the handler is part of an event processor.