Monitoring
AxonIQ Console provides a comprehensive monitoring solution for your Axon Framework applications. By defining conditions based on Framework Metrics, you can get notified when something goes wrong in your application. This way, you can take action before it becomes a problem.
Currently, monitoring is only available for Axon Framework applications. Monitoring for Axon Server instances is planned for a future release. |
You can set up conditions for any metrics available in AxonIQ Console, which are collected by the Axon Framework Client for AxonIQ Console automatically. These conditions are checked once per minute by AxonIQ Console. If thresholds are exceeded, an alert is created.
Conditions
In the Monitoring tab, you can set up conditions for all instances of those resources at once. This way, you can for example set up a condition that triggers an alert when the ingest latency of any event processor in any application exceeds a certain threshold.
You can add a condition to any resource type by clicking the "Add new condition" button. This adds a new condition to the list that you can configure and then save. The formula has the following parts:
Field | Description | Possible values |
---|---|---|
Level |
The level of the alerts, useful for filtering which integration receives which alerts |
Incident, Critical, Major, Minor |
Metric |
The metric to check |
Differs per resource, see available metrics. |
Operator |
The operator to use for the check |
=, !=, >, <, >=, <= |
Value |
The value to compare the metric to |
Any number |
Percentile |
In case the metric is a timer, select the percentile to check against. Generally the ninetieth percentile is recommended |
Minimum, Median, ninetieth, ninetyfifth, Maximum |
Duration |
The amount of minutes until the alert is sent to the configured integrations. This helps prevent false positives. |
Any number |
The screen shows this in a readable format, so you can think of it as: "Create <level> when <metric> <operator> <value> for <duration> minutes", or "Create critical when segment claim percentage != 100% for 2 minutes". You can see this in the screen below.
You can always adjust the conditions by clicking the "Edit" button next to the condition. This makes the entire row editable. You can change any field, except the level and metric. If you want to change the level or metric, you need to delete the condition and add a new one.
Specific instances
If you want to set up conditions for a specific instance of a resource, you can do so by navigating to the resource in the application and clicking "Configure" next to the Alerts header in the top right corner. This opens a dialog where you can add a new condition for that specific instance.
Setting up conditions for a specific instance works similar to setting up conditions for all instances. You can find a list of all available metrics and their defaults below. After adding a specific condition, it can be found in the resource itself, and in the Overrides section of the Monitoring tab. This way you can easily see which resources have specific conditions set up.
In addition, the Conditions section of the Monitoring tab will show "x override(s)" when a resource has specific conditions set up.
Alerts
When a condition is met, an alert is created. You can see all alerts in the Alerts section of the Monitoring tab. Each resource page also has an Alerts section where you can see all alerts for that specific resource. You can also see a badge in all tables where resources are listed with the number of alerts, like in the example below.
When you click on a row with alerts, you are taken to the resource page where you can see all alerts for that resource.
Integrations
AxonIQ Console can send alerts to various integrations. Currently, only Slack is supported. More integrations are planned for a future release.
Slack
There are three steps to set up Slack integration:
-
Add our Slack app to your workspace
-
Connect your Slack workspace to your AxonIQ Console workspace
-
Set up the channels to send alerts to
Due to the dynamic nature of Slack, we cannot provide a step-by-step guide here. However, we can provide you with the information you need to set up the integration. You can find this information in the Integrations section of the Monitoring tab.
The IDs and codes in the above image are unique to your workspace, and the codes in the image are not valid. You can find the correct codes in your workspace.
Available metrics
The following table contains all their available metrics and their defaults. The defaults have been found by our Solution Engineers to be a good start to set up monitoring. Some of these are automatically set up for you when you start using AxonIQ Console.
Resource | Metric | Default threshold | Set up by default |
---|---|---|---|
Message Handler |
Error Rate |
> 1% |
Yes, Critical |
Message Handler |
Latency (P90) |
> 200 ms |
Yes, Critical |
Message Handler |
Throughput |
> 1000/minute |
No |
Aggregate |
Error Rate |
> 1% |
Yes, Critical |
Aggregate |
Latency (P90) |
> 200 ms |
Yes, Critical |
Aggregate |
Lock Time (P90) |
> 25 ms |
Yes, Major |
Aggregate |
Load Time (P90) |
> 100 ms |
Yes, Major |
Aggregate |
Event Commit Time (P90) |
> 300 ms |
Yes, Major |
Event Processor |
Segment Claim Percentage |
!= 100% |
Yes |
Event Processor |
Ingest latency |
> 100 ms |
Yes, Major |
Event Processor |
Commit latency |
> 300 ms |
Yes, Major |
Event Processor |
DLQ Size |
> 0 |
Yes, Critical |
Application |
Replica Count |
< 1 |
Yes, Critical |
Application |
CPU Usage |
> 80% |
Yes, Major |
Application |
Host CPU Usage |
> 80% |
Yes, Major |
Application |
Heap Usage |
> 80% |
Yes, Major |
Application |
Thread Count |
> 200 |
No |
Application |
Query Bus Usage |
> 80% |
Yes, Major |
Application |
Command Bus Usage |
> 80% |
Yes, Major |