Backups

The core strategy employed by AxonServer to keep data available, is to replicate it over various cluster nodes. These nodes should be in availability zones that are isolated from each other in relevant disaster scenarios. With Axon Server 4.3, the introduction of specific backup_nodes makes it easier to set up and operate AxonServer without ever making explicit backups to off-line media. Nevertheless, there are also environments where such backups are a strict requirement and for that reason, AxonServer does support it.

There are three types of items that need to be backed up

Control Database
Event Store Segments
Log Entry Segments

To support the creation of consistent backups, Axon Server provides a REST API. This API provides three controllers to perform backup operations

Backup Info Rest Controller - End point for Axon SE/EE for event stream segment backup
Backup Control DB Rest Controller - End point for Axon SE for control database backup
Cluster Backup Info Rest Controller - End points for Axon EE for control database and log entry segment backup

The API documentation is accessible at http:[server]:[port]/swagger-ui.html.

Control database

The control database is a relational H2 database and contains important configuration information for your Axon Server SE/EE deployment. Although it’s stored in a single file, this file cannot be simply copied for backup as it may not be in a safe state.

For Axon Server SE, a call to the POST endpoint http://[server]/createControlDbBackup forces the creation of a proper backup file.

For Axon Server EE, a call to the POST endpoint `http://[server]/v1/backup/createControlDbBackup`forces the creation of a proper backup file. The [server] could be any node within the cluster which serves the _admin context.

In both cases, it returns the full path to that file (.zip), which can then be used to move that file to another storage medium.

Event store segments

The event store segments are either closed and immutable, or still open for new events. For the closed segments, it is feasible to only back up the ones that haven’t been backed-up yet, since the ones that have been are guaranteed not to change.

For both Axon Server SE/EE, a call to the GET endpoint http://[server]/v1/backup/filenames with event type (either EVENT or SNAPSHOT), the context name and optionally the last segment that has already been backed up will return a list of file names belonging to segments that haven’t been backed up yet, but which are now safe to backup by simply copying them.

For Axon SE, the [server] is the single Axon Server SE node while in the case of Axon EE, the [server] could be any node that is a PRIMARY member node for the context that needs to be backed up.

In addition, you may choose to back up the current segment file that is being written to. These are files with names larger than the last entry returned to the filenames from the backup endpoint. It is important to overwrite this file with subsequent backups, because no guarantees can be given about the completeness of this file. This means the filename of this file should not be used to construct the "lastSegmentBackedUp" in subsequent requests to the backup endpoint.

Note

From Axon Server SE version 4.5.12 / Axon Server EE version 4.5.17 onwards there is a new endpoint available: /v1/backup/eventstore. This endpoint returns a JSON object with the files to back up, including the currently active event store segment. It also returns the number of last closed event store segment. This number can be used in subsequent backups to retrieve files updated since the last backup.

Log entry segments (only for Axon Server clusters)

Unlike the event stream segments, the log entry segments backup should not be done incrementally. All the files are replaced by the next backup. The log entry segments backup is supported by the GET endpoint http:[server]/v1/backup/log/filenames. It takes the replication group name and returns a list of file names that completely replace the previous backup for that replication group. The [server] could be any node that is a PRIMARY member node for the replication group that needs to be backed up.

Even if the recent file has incomplete data, a node will be able to recover a consistent state from such a file and will initialize itself at the position immediately after the last complete write. The replication process (if present) will ensure subsequent entries are automatically synchronized.

Because the control database contains a pointer to the last log entry that is known to be stored safely on the cluster (the commit index), the proper order of doing this is to first create the control database backup and then backing up the log entry segments and the event stream segments.

This will ensure that the log entry segments may have entries beyond the commit index (which is ok) but there are not missing entries before the commit index (which would be bad). The log entries segments must be backed up within *30 minutes* after the backup of the controlDB, to prevent the log compaction procedure causes data inconsistencies.