Workflow Lifecycle
A workflow moves through a series of states during its lifetime. This section covers how workflows terminate—both intentionally and unexpectedly—and how to react to state changes with lifecycle listeners.
|
Each workflow instance runs on a virtual thread. Virtual threads are lightweight—a suspended workflow (for example, waiting for an event) consumes almost no memory or OS resources, so you can run thousands of concurrent workflow instances without issue. In a future version, the engine will support offloading long-running suspended workflows to disk, freeing memory entirely until they are resumed. |
Workflow states
A workflow is always in one of these states:
| Status | Description |
|---|---|
|
The workflow is actively executing steps. |
|
The workflow method returned normally—all work is done.
Any async steps (started via |
|
The workflow was explicitly failed via |
|
The workflow was cancelled via |
|
The workflow exceeded its overall timeout. |
COMPLETED, FAILED, CANCELLED, and TIMED_OUT are terminal states—the workflow will not be retried or resumed.
The workflow context
The SimpleWorkflowContext passed to your workflow method is your single entry point to all engine capabilities:
| Category | What you can do |
|---|---|
Workflow data |
|
Execute actions |
|
Wait for events |
|
Manage state |
|
Terminate |
|
Orchestrate |
|
In Kotlin, the equivalent is Kontext, which provides the same capabilities with idiomatic Kotlin syntax—payload as a property, named parameters with defaults, and kotlin.time.Duration support.
|
You can also build your own custom workflow context with domain-specific methods. See Custom Workflow Context. |
fail() v.s. cancel()
Both terminate the workflow, but they signal different intent and are called from different places:
| Method | Workflow status | Called from | Use case |
|---|---|---|---|
|
|
Inside the workflow |
The workflow detected an error and cannot continue. Called from your workflow code. |
|
|
Outside the workflow |
An external action (user, admin, API) wants to stop a running workflow. |
fail—internal termination
Use fail inside your workflow code when the business logic determines the workflow should stop.
If the workflow has running steps (for example, parallel steps started with execute), fail cancels all of them first, producing cancellation events for each, before publishing the workflow failure event:
// Start two steps in parallel (non-blocking)
var shipping = ctx.execute("shipOrder", ctx.workflowPayload(),
ShippingService::shipOrder,
Duration.ofMinutes(5), defaults());
var notification = ctx.execute("notifyCustomer", ctx.workflowPayload(),
NotificationService::notifyCustomer,
Duration.ofMinutes(1), defaults());
// Meanwhile, check stock — if unavailable, fail the workflow
var reserved = ctx.awaitExecute("reserveStock", Boolean.class,
InventoryService::reserveStock);
if (!reserved) {
ctx.fail(new RuntimeException("Stock unavailable")); (1)
}
| 1 | Cancels shipOrder and notifyCustomer first, then terminates the workflow with FAILED status. |
Running steps are cancelled first—each CANCELLED event carries the reason as a WorkflowFailedException. Then the workflow itself is terminated with a FAILED event that records the original exception (RuntimeException in this case), not the wrapper.
cancel—external termination
Use cancel from outside the workflow—for example, when a user or API wants to stop a running workflow.
Like fail, cancel first cancels all running steps, then terminates the workflow.
Given a workflow with parallel steps running:
// Inside the workflow — two long-running steps in parallel
var shipping = ctx.execute("shipOrder", ctx.workflowPayload(),
ShippingService::shipOrder,
Duration.ofMinutes(5), defaults());
var notification = ctx.execute("notifyCustomer", ctx.workflowPayload(),
NotificationService::notifyCustomer,
Duration.ofMinutes(1), defaults());
An external caller cancels the workflow while those steps are still running:
var execution = workflowExecutionRepository.findById(workflowId);
execution.cancel("Cancelled by user");
All running steps are cancelled first, then the workflow itself is terminated with a CANCELLED event (not FAILED).
Normal completion with running steps
The same "cancel first, terminate after" rule applies when the workflow method simply returns while async steps are still running.
A step started with execute(…) and never awaited keeps running in the background.
When your workflow body returns, the engine cancels any such step and waits for its CANCELLED event to be written before publishing the CompletedWorkflow event:
public void execute(SimpleWorkflowContext ctx) {
// fire-and-forget — notice there is no .await() on the returned result
ctx.execute("sendReceipt", ctx.workflowPayload(),
NotificationService::sendReceipt,
Duration.ofMinutes(1), defaults());
// workflow body returns while sendReceipt is still running
}
This guarantees that every started step has a terminal event in the log—no silent losses from a step that would have failed after the workflow had already completed.
|
If you need the workflow to wait for the step’s actual result, call |
Unhandled exceptions
|
If a workflow exits due to an unhandled exception—one where you didn’t explicitly call Always ensure your workflow has explicit terminal paths—call |
Lifecycle listeners
You can register listeners that are called when a workflow reaches a specific state. This is useful for cleanup, notifications, or triggering follow-up processes.
Annotation-based listeners
The simplest way is to annotate methods on your workflow class:
public class OrderFulfillmentWorkflow {
@Workflow(idProperty = "orderId", startOnEvent = "OrderPlaced")
public void execute(SimpleWorkflowContext ctx) {
// ... workflow logic
}
@OnSuccess (1)
public void onCompleted(WorkflowStatus status, SimpleWorkflowContext ctx) {
logger.info("Order {} fulfilled successfully!", ctx.workflowPayload().get("orderId"));
}
@OnFailure (2)
public void onFailed(WorkflowStatus status, SimpleWorkflowContext ctx) {
logger.warn("Order {} failed: {}", ctx.workflowPayload().get("orderId"), status);
}
@OnCancellation (3)
public void onCancelled(WorkflowStatus status, SimpleWorkflowContext ctx) {
logger.info("Order {} was cancelled", ctx.workflowPayload().get("orderId"));
}
@OnTimeout (4)
public void onTimedOut(WorkflowStatus status, SimpleWorkflowContext ctx) {
logger.warn("Order {} timed out", ctx.workflowPayload().get("orderId"));
}
}
| 1 | Called when the workflow completes successfully (COMPLETED status). |
| 2 | Called when the workflow fails (FAILED status). |
| 3 | Called when the workflow is cancelled (CANCELLED status). |
| 4 | Called when the workflow times out (TIMED_OUT status). |
Each listener method receives the WorkflowStatus and the WorkflowContext, giving you access to the workflow payload and ID.
|
Lifecycle listeners must be defined in the same class as the |
Programmatic listeners
For more control, register listeners via the declarative configuration:
.customized((c, w) -> w
.registerWorkflowStatusChangeListener(WorkflowStatus.COMPLETED,
(status, context) -> {
logger.info("Workflow {} completed", context.workflowId());
})
.registerWorkflowStatusChangeListener(WorkflowStatus.FAILED,
(status, context) -> {
logger.warn("Workflow {} failed", context.workflowId());
})
)
You can also unregister listeners:
w.unregisterWorkflowStatusChangeListener(WorkflowStatus.COMPLETED, myListener);
|
Lifecycle listeners are called after the terminal event is published. They run in the context of the workflow execution and have access to the full workflow payload. |