Distributed Tracing
Distributed Tracing enables you to track the path of a message through your system to see how the system behaves and performs. Axon Framework provides additional tracing functionality to track what takes time in your microservice, such as how long it took to load the aggregate, how long the actual command invocation took, or how long it took to publish events.
OpenTracing deprecation warning
The OpenTracing extension works in a different way than described on this page. Its functionality is limited and will not be updated to include the additional functionality described on this page. The OpenTracing standard itself is deprecated, please consider moving to OpenTelemetry instead.
Span factories
To provide additional insights in traces, many Axon Framework components use a SpanFactory.
This factory is responsible for the creation of multiple instances of a Span with a specific purpose.
You can use a SpanFactory provided the framework that matches your tracing standard.
Or, if your tracing standard of choice is not supported, you can create one yourself by implementing the SpanFactory and Span interfaces.
The following standards are currently supported:
| Tracing standard | Supported | Description |
|---|---|---|
OpenTelemetry |
Yes |
OpenTelemetry is the successor of OpenTracing, with auto-instrumentation being its most prominent feature. |
OpenTracing |
Limited |
OpenTracing is supported by an extension with limited functionality. Usage of OpenTelemetry is recommended instead. |
SLF4j |
Yes |
If you have no monitoring system in place but want to trace through logging, the framework provides a |
You configure a SpanFactory in the following ways:
Axon Configuration API
public class AxonConfig {
// omitting other configuration methods...
public void configure(Configurer configurer) {
configurer.configureSpanFactory(configuration -> new MyCustomSpanFactory());
}
}
Spring Boot auto configuration
@Configuration
public class AxonConfig {
// omitting other configuration methods...
@Bean
public SpanFactory spanFactory() {
// Any bean implementing the SpanFactory will be picked up automatically and override the defaults
return new MyCustomSpanFactory();
}
}
Note that this is not necessary for all providers, since some may provide Spring Boot auto-configuration out of the box. To configure the provider of your choice, please refer to the specific subsection on this page.
Terminology
A trace is a collection of one or more spans that together form a complete journey through your software. Creating a span that is not part of a trace will automatically create one with that span being the root span of the trace.
Tools such as ElasticSearch APM can render tracing information, as visible in the following image:
What we observe here is that a command is dispatched, distributed by Axon Server and handled.
As a result of the command an AccountRegisteredEvent is published and a deadline is scheduled as well.
In this image, the AutomaticAccountCommandDispatcher.dispatch span is the root trace, with each span being part of a call hierarchy within that trace.
Combining factories
Sometimes you want the functionality of multiple SpanFactory implementations, while Axon’s configuration only allows one.
For this purpose, the framework contains the MultiSpanFactory that you can configure with multiple factories to which it delegates its calls.
For example, you can configure both the LoggingSpanFactory and the OpenTelemetrySpanFactory in the following fashion:
Axon Configuration API
public class AxonConfig {
// omitting other configuration methods...
public void configure(Configurer configurer) {
configurer.configureSpanFactory(configuration -> new MultiSpanFactory(
Arrays.asList(
LoggingSpanFactory.INSTANCE,
OpenTelemetrySpanFactory.builder().build()
)
));
}
}
Spring Boot auto configuration
@Configuration
public class AxonConfig {
// omitting other configuration methods...
@Bean
public SpanFactory spanFactory() {
return new MultiSpanFactory(
Arrays.asList(
LoggingSpanFactory.INSTANCE,
OpenTelemetrySpanFactory
.builder()
.build()
)
);
}
}
By configuring the MultiSpanFactory a single, delegating span is created whenever the framework requests it.
This span contains the multiple span, one of each configured factory.
The deleting span makes sure all spans are called, acting as a single one.
Features
The following functionality in Axon Framework is traced in addition to the tracing capabilities already provided by the standard of your choice:
Tracing all of this functionality provides you with the best possible insight into the performance of your application.
Span types
The configured SpanFactory is responsible for creating spans when the framework requests it.
The framework specifies the type of span, the name, and a message that triggered the span (if any, it’s not required).
The framework can request the span types defined in the following table:
| Span Type | Description |
|---|---|
Root trace |
Create a new trace entirely, having no parent. |
Dispatch span |
A span which is dispatching a message. |
Handler span |
A span which is handling a message. Will set the span that dispatched the message as the parent. |
Internal span |
A span which specified something internal. It’s not an entry or exit point. |
A trace generally consists of multiple spans with different types, depending on the functionality.
Span nesting
Starting a span will make it a child span of the currently active one. If there’s currently no span active, the new span will become the root span of a new trace.
During invocations which are normally synchronous, Axon Framework will create normal spans which become a child of the currently active one. For example, publishing an event from a command is synchronous, and therefore the publishing span becomes a child of the command handling span.
Asynchronous invocations come in two flavours, and the framework handles them differently:
-
Network-asynchronous, caller-synchronous: the call crosses a network boundary (typically Axon Server gRPC) but the caller still waits for a reply. Distributed commands and distributed queries fall into this category.
-
Fire-and-forget asynchronous: the call is dispatched and may be handled minutes, hours, or days later, often during a replay. Streaming event processors, deadline firing and subscription query updates fall into this category.
For the first flavour, the receiving side joins the dispatcher’s trace by default, because the caller is actively waiting and benefits from seeing the whole path. For the second flavour, the framework starts a new root trace by default, because folding a long-deferred handler into the original trace would spread that trace over arbitrary time spans and make it unreadable.
The exact default per component:
| Component | Default behaviour | Override |
|---|---|---|
Distributed command (over Axon Server) |
Joins dispatcher’s trace as a child |
|
Distributed query (over Axon Server) |
Joins dispatcher’s trace as a child |
|
Streaming event processor |
New root trace per batch ( |
|
Subscribing event processor |
Child of the publishing span (synchronous invocation) |
None (handler runs inside the publishing unit of work) |
Deadline firing |
New root trace with an OpenTelemetry link back to the scheduling span |
Not configurable |
Subscription query updates |
New root trace with an OpenTelemetry link back to the listening query span |
Not configurable |
Snapshot creation |
Child of the trigger’s trace (command that prompted the snapshot) |
|
Saga |
Same as the underlying event processor (a saga is invoked by an event processor) |
See streaming/subscribing event processor rows |
Some standards, like OpenTelemetry, support linking. By linking one span to another, they become correlated despite being part of a different trace. Tooling that supports this creates links the user can click, allowing for easy navigation between related traces. This is incredibly useful to see causation within your system. Components in the table above marked "link back to…" rely on this feature.
For OpenTelemetry-based setups, propagation across these boundaries depends on the W3C trace context being injected into the message metadata at dispatch time and extracted again at handling time. The OpenTelemetrySpanFactory does this automatically via its TextMapPropagator. Make sure the factory is wired with a real propagator (see Activation paths). Without it, none of the same-trace nesting or span-link behaviour above is observable, because the parent span context never reaches the handler. Every property described here is also listed in the Configuration table together with the rest of the Spring Boot tracing settings.
Span attribute providers
Most tracing implementations can add additional attributes to spans.
This is useful when debugging your application or finding a specific span you are looking for.
The framework provides the SpanAttributesProvider, which can be registered to the SpanFactory either via its builder (if supported) or by calling the SpanFactory.registerSpanAttributeProvider(provider) method.
The following SpanAttributesProvider implementations are included in Axon Framework:
| Class | label | description |
|---|---|---|
|
|
The aggregate identifier of the message, only present in case of a |
|
|
The identifier of the message |
|
|
The name of the message for Commands and Queries |
|
|
The class of the message, such as |
|
|
The class of the payload in the message |
|
|
All metadata of the message is also added to the span with its corresponding key |
In addition to the ones provided by the framework, you can also create a custom SpanAttributesProvider.
and add it to the SpanFactory.
Use this if you want to add custom information on spans as a label.
public class CustomSpanAttributesProvider implements SpanAttributesProvider {
@Nonnull
@Override
public Map<String, String> provideForMessage(@Nonnull Message<?> message) {
// Provide your labels based on the message here
return Collections.emptyMap();
}
}
You can register this custom SpanAttributesProvider in one of the following ways.
Axon Configuration API
public class AxonConfig {
// omitting other configuration methods...
public void configure(Configuration configuration) {
configuration.spanFactory().registerSpanAttributeProvider(new CustomSpanAttributesProvider());
}
}
Spring Boot auto configuration - bean creation
@Configuration
public class AxonConfig {
// omitting other configuration methods...
@Bean
public SpanAttributesProvider customSpanAttributesProvider() {
// Auto-configuration picks beans of type SpanAttributesProvider up automatically.
return new CustomSpanAttributesProvider();
}
}
OpenTelemetry
Axon Framework provides OpenTelemetry support out of the box. The OpenTelemetry standard improves upon the OpenTracing and OpenCensus standards by providing more auto-instrumentation without the need for the user to configure many things.
OpenTelemetry works by adding a Java agent to the execution of the application. Based on the configuration, the agent will collect logs, metrics and tracing automatically before sending it to a collector that can provide insights. ElasticSearch APM, Jaeger and many other tools are available for collecting and visualting the information. The configuration of these tools is beyond the scope of this guide. You can find more information in the "Getting Started" section of the OpenTelemetry documentation.
OpenTelemetry "supports a lot of libraries,
frameworks and application servers out of the box."
For example, when a Spring REST endpoint is called it will automatically start a trace.
With the axon-tracing-opentelemetry module, this trace will be propagated to all subsequent Axon Framework messages.
For example, if the REST call produces a command which is sent over Axon Server, handling the command will be included in the same trace as the original REST call.
Activation paths
There are three common ways to put an OpenTelemetry SDK in place. They all work with Axon Framework, but they differ in how the SDK is exposed to OpenTelemetrySpanFactory:
| Path | How the SDK is exposed | What Axon needs |
|---|---|---|
OpenTelemetry Java agent ( |
The agent registers an |
|
Manual SDK setup with |
The application calls |
Same as the Java agent: builder defaults find the SDK via |
Spring Boot 3 with |
Spring Boot creates an |
The auto-configuration injects the Spring-managed |
Configuration
To get OpenTelemetry support enabled you will need to add the following dependency to your application’s dependencies:
Maven
<dependency>
<groupId>org.axonframework</groupId>
<artifactId>axon-tracing-opentelemetry</artifactId>
<version>${axon-framework.version}</version>
</dependency>
Gradle
implementation group: 'org.axonframework', name: 'axon-tracing-opentelemetry', version: axonFrameworkVersion
Depending on your application, more configuration might be needed.
Spring Boot auto configuration
When using the Spring Boot auto-configuration of Axon Framework, most things will be autoconfigured regardless of the implementation.
You might want to configure certain settings that are available. The following table contains all configurable settings, their defaults, and what they change:
| setting | Default | Description |
|---|---|---|
|
|
Whether to show event sourcing handlers as a trace. This can be noisy and is disabled by default. |
|
|
Whether to add the aggregate identifier as a label when handling a message |
|
|
Whether to add the message identifier as a label when handling a message |
|
|
Whether to add the message name as a label when handling a message |
|
|
Whether to add the message type as a label when handling a message |
|
|
Whether to add the payload type as a label when handling a message |
|
|
Whether to add the metadata properties as labels when handling a message |
|
|
Whether distributed commands (handled remotely after a hop through Axon Server) should be part of the same trace as the dispatching span. Set to |
|
|
Whether distributed queries should be part of the same trace as the dispatching span. Set to |
|
|
Whether the spans of a |
|
|
Time limit between an event being published and being processed by a |
|
|
By default, a |
|
|
Name of the span attribute used to record the deadline id. |
|
|
Name of the span attribute used to record the deadline scope. |
|
|
Name of the span attribute used to record the saga identifier. |
|
|
Whether snapshot creation should run in its own root trace instead of being a child of the command that triggered it. |
|
|
Whether the aggregate type should appear in the |
|
The defaults are deliberately asymmetric: command and query buses default to same-trace propagation because a caller is actively waiting for a synchronous reply and benefits from seeing the whole path in a single trace. Streaming event processors default to new root trace per event because events can be processed far in the future (during replays, slow consumers, segment claim transfer, …) and folding them into the original trace would spread a single trace over arbitrary time spans, making it unreadable. Enable |
Manual configuration
The OpenTelemetry support can also be configured using the Configurer of Axon Framework to configure the OpenTelemetrySpanFactory.
public class AxonConfig {
// omitting other configuration methods...
public void configure(Configurer configurer) {
configurer.defaultConfiguration()
.configureSpanFactory(c -> OpenTelemetrySpanFactory.builder().build());
}
}
Note that when not using Spring boot, tracing each message handler invocation is not supported due to a limitation.
OpenTracing
The OpenTracing standard is deprecated. If necessary, you can still use the OpenTracing extension of Axon Framework.
Note that the functionality of this extension is rather limited compared to the OpenTelemetry integration. Because of this, it’s recommended to switch to OpenTelemetry if possible.
Logging
Sometimes you don’t have an APM system available, for instance, during local development.
It might still be useful to see the traces that would be started and finished to obtain insights.
For this purpose, the framework provides a LoggingSpanFactory.
You can configure the LoggingSpanFactory in the following ways:
Traced components
Axon Framework provides a large range of components that are traced by the configured SpanFactory.
The spans created by each component are available for reference in this section, with additional information about how they should be interpreted.
It’s important to note that the availability of these spans is highly dependent on the application configuration.
For instance, some components are only used when using Axon Server, or you might have created your own CommandBus
implementation which does not call the SpanFactory API.
Commands
The CommandBus is instrumented to create spans for both dispatching and handling commands.
The tracing differs based on whether you are using Axon Server.
The following tabs show the possible traces.
Axon Server
When using the AxonServerCommandBus, there will be two handling and dispatch traces since it uses a second CommandBus to invoke the command locally after receiving it from Axon Server.
In addition, you can see the gRPC-call to Axon Server and the time it took to handle the call.
| Trace name | Description |
|---|---|
|
The bus is dispatching the command to Axon Server. |
|
The bus has received a command and is handling it. |
|
The localSegment invocation, dispatching the command locally. |
|
The localSegment is handling the command. |
|
The aggregate is being loaded by the repository. During this time Axon Framework will obtain a lock, fetch snapshots and events from the event store to hydrate the aggregate. |
|
The repository is obtaining a lock for the aggregate. This taking some time indicates that the command was queued due to another command being handled for the same aggregate. |
Without Axon Server
| Trace name | Description |
|---|---|
|
The bus is dispatching the command locally. |
|
The bus is invoking the handler locally. |
|
The aggregate is being loaded by the repository. During this time Axon Framework will obtain a lock, fetch snapshots and events from the event store to hydrate the aggregate. |
|
The repository is obtaining a lock for the aggregate. This taking some time indicates that the command was queued due to another command being handled for the same aggregate. |
During handling of commands, other functionality might be invoked such as scheduling deadlines or publishing events. Please refer to the specific sections of this functionality for more information.
Events
When publishing events, spans are created to indicate the event being published.
Each event that is being published has its own specific publishing span.
Subscribing event processors handle the event synchronously inside the same trace.
For streaming event processors (and sagas backed by one), the relationship to the publishing span depends on configuration: by default each batch of events forms its own root trace with no link back; setting axon.tracing.eventProcessor.disableBatchTrace=true makes each handler invocation a separate root trace with an OpenTelemetry link back to the publishing span (clickable in the APM UI); setting axon.tracing.eventProcessor.distributedInSameTrace=true makes the handler a child of the publishing trace (within the configured time limit). See Span nesting for the full matrix.
| Trace name | Description |
|---|---|
|
For each event, a short span is created to indicate that an event was published. |
|
Indicates events being committed to the event store. |
Event processors
Event processor invocations are traced as well. Since Streaming Event Processors are asynchronous, a new root trace is created for each event by default. Subscribing event processors, on the other hand, will become part of the current trace because they are invoked synchronously.
Streaming event processors
| Trace name | Description |
|---|---|
|
Root trace of handling the event, includes all interceptor invocations. |
|
Inner span of handling the event, after all interceptors have been invoked. |
|
Wraps a batch of events handled by the processor in a single trace, with each |
Streaming event processor tracing exposes three configurable behaviours, controlled via the properties listed in Span nesting:
-
axon.tracing.eventProcessor.distributedInSameTrace(defaultfalse): when enabled, a…process(${EventClass})span becomes a child of the trace that published the event, provided the event is handled within…distributedInSameTraceTimeLimit(default 2 minutes). Older events still start a new root trace so that long-running replays do not produce arbitrarily long traces. -
axon.tracing.eventProcessor.disableBatchTrace(defaultfalse): disables the…batchparent span. Each event handler then starts its own root trace, with an OpenTelemetry link back to the publishing span. Choose this when you want clickable cause/effect navigation per event in the APM UI without bundling unrelated events into one batch span. -
These two are independent. Setting both to
false(the default) gives you the traditional behaviour: each batch wrapped in a…batchroot span. SettingdistributedInSameTrace=truetypically also requiresdisableBatchTrace=trueto avoid a confusing mix of a batch root and same-trace nesting.
Subscribing event processors
| Trace name | Description |
|---|---|
|
The event is being handled by the subscribing event processor. Always a child span of the current trace (subscribing processors are invoked synchronously inside the publishing unit of work). |
Deadlines
Any action related to deadlines is traced in order to gain insight into what happened during specific calls. Mutations on deadlines generally happen from another root trace, such as a command or saga. The handling span of a deadline will be linked to the scheduling span for easy navigation.
| Trace name | Description |
|---|---|
|
A deadline was scheduled. |
|
A deadline was cancelled based on name and |
|
All deadlines with a specific name were cancelled. |
|
All deadlines within a specific scope with a specific name were cancelled. |
|
Root trace of a deadline firing, containing the name and payload class. |
Snapshotting
By default, the snapshot creation spans are part of the same trace as the command that triggered them. This matches the framework’s runtime behaviour, where the snapshot is created right after the command’s unit of work commits, and keeps the cause (the command) and the effect (snapshot creation, including its performance characteristics) visible together in a single trace.
Set axon.tracing.snapshotter.separateTrace=true to put snapshot creation into its own root trace, linked back to the triggering command’s span. This is useful when snapshot creation runs on a separate executor that you’d rather not see folded into command latency. The trade-off is the extra click to navigate from the command to the snapshot creation. See the Configuration table for the property reference.
| Trace name | Description |
|---|---|
|
A snapshot creation task is being submitted. Depending on performance, the executor might take a while to pick it up. |
|
The |
When separateTrace=true, the …createSnapshot($aggregateClass) span becomes the root of the snapshot trace and does not contain the aggregate identifier, so the APM tool groups any Snapshotter calls of the same aggregate type together.
Sagas
Sagas are a special type of event processor that can invoke multiple saga’s for a single event.
Because of this the AbstractSagaManager has been instructed with additional tracing information.
These spans are descendants of an event processor span that invokes the manager.
| Trace name | Description |
|---|---|
|
A matching saga has been found and is being invoked. |
|
The manager is constructing a new saga. |
Queries
Queries support tracing in all of their forms. In order to be clear about how they work, this section is split based upon the query’s type. For all types, the created spans will differ based on whether Axon Server is used or not. The spans that are only available with Axon Server are marked as such.
Direct queries
Direct queries fetch a single result (either a single item or a single list) and receive no updates. Traces will differ based on whether Axon Server is used or not. The following tabs show the possible traces.
Axon Server
| Trace name | Description |
|---|---|
|
The requesting service is dispatching the query. |
|
The handling service is handling the query request in a task. |
|
The handling service is handling the query. |
|
The requesting service is processing the response. |
Streaming queries
Streaming queries look similar to the traces of a Direct query.
They do not contain a ResponseProcessingTask span since their results are directly published to the invoker of the query.
Traces will differ based on whether Axon Server is used or not.
The following tabs show the possible traces.
Axon Server
| Trace name | Description |
|---|---|
|
The requesting service is dispatching the query. |
|
The handling service is handling the query request in a task. |
|
The handling service is handling the streaming query. |
Scatter-gather queries
Scatter-Gather queries are like a direct query but can fetch results from multiple services at the same time. Part of the trace can thus be duplicated multiple times, since multiple services are invoked. Traces will differ based on whether Axon Server is used or not. The following tabs show the possible traces.
Axon Server
| Trace name | Description |
|---|---|
|
The requesting service is dispatching the query. |
|
The handling service is handling the query request in a task. |
|
Each handling service is handling the query. Each handler within the same service has its own index. |
Subscription queries
Subscription queries are traces in a different way than others. Subscription queries have an initial result, which is traces like a direct query. However, new results can later be published at any time after while the caller is still subscribed to it.
In order to prevent malformed traces, since most APM tools have a maximum span time before flushing them, publication of new results is not part of the original trace.
However, invocations of the SimpleQueryUpdateEmitter will be linked to the span of the queries that are listening to it, so the original call can easily be found.
The QueryUpdateEmitter traces will look like the following table:
| Trace name | Description |
|---|---|
|
A new update is emitted. |
|
A new update is emitted for a specific consumer. |
In addition to this, the spans of the direct queries section apply as well.
Message handler invocations
The TracingHandlerEnhancerDefinition automatically creates a span for each message handler invocation within your application.
This is true for commands, events, queries and even custom message handlers.
Spans will be created with the following format:
ContainingClassName.methodName(ArgumentClass1, Argumentclass2, etc).
Examples of this are:
-
RoomAvailabilityHandler.on(RoomAddedEvent)
-
Account(RegisterAccountCommand,DeadlineManager)
The TracingHandlerEnhancerDefinition functionality is autoconfigured for Spring Boot, with event sourcing handlers turned off by default.
This is because loading an aggregate might invoke many of these handlers, hitting the maximum number of spans for your APM tool.
Please refer to the Spring Boot configuration section if you want to enable this.