Workflow Lifecycle

A workflow moves through a series of states during its lifetime. This section covers how workflows terminate—both intentionally and unexpectedly—and how to react to state changes with lifecycle listeners.

Each workflow instance runs on a virtual thread. Virtual threads are lightweight—a suspended workflow (for example, waiting for an event) consumes almost no memory or OS resources, so you can run thousands of concurrent workflow instances without issue. In a future version, the engine will support offloading long-running suspended workflows to disk, freeing memory entirely until they are resumed.

Workflow states

A workflow is always in one of these states:

Status Description

STARTED

The workflow is actively executing steps.

COMPLETED

The workflow method returned normally—all work is done. Any async steps (started via execute(…​) without a follow-up await()) that are still running are cancelled before the workflow transitions to COMPLETED.

FAILED

The workflow was explicitly failed via ctx.fail().

CANCELLED

The workflow was cancelled via ctx.cancel() or execution.cancel().

TIMED_OUT

The workflow exceeded its overall timeout.

COMPLETED, FAILED, CANCELLED, and TIMED_OUT are terminal states—the workflow will not be retried or resumed.

The workflow context

The SimpleWorkflowContext passed to your workflow method is your single entry point to all engine capabilities:

Category What you can do

Workflow data

workflowId()—the unique workflow instance ID
workflowPayload()—the current workflow payload
workflowStatus()—the current status (STARTED, COMPLETED, FAILED, …​)
workflowStepNames()—names of all steps executed so far

Execute actions

awaitExecute(…​)—run an action and wait for the result
execute(…​)—run an action without waiting (returns WorkflowStepResult)

Wait for events

awaitEvent(…​)—block until an external event arrives
waitForEvent(…​)—wait without blocking (returns WorkflowStepResult)
sleep(…​)—pause the workflow for a duration

Manage state

setPayload(…​)—durably update the workflow payload

Terminate

fail(…​)—terminate the workflow with a failure
cancel(…​)—cancel the workflow
cancelStep(…​)—cancel a single step

Orchestrate

allMatch(…​), anyMatch(…​), noneMatch(…​)—combine multiple non-blocking results

In Kotlin, the equivalent is Kontext, which provides the same capabilities with idiomatic Kotlin syntax—payload as a property, named parameters with defaults, and kotlin.time.Duration support.

You can also build your own custom workflow context with domain-specific methods. See Custom Workflow Context.

fail() v.s. cancel()

Both terminate the workflow, but they signal different intent and are called from different places:

Method Workflow status Called from Use case

ctx.fail(new RuntimeException("reason"))

FAILED

Inside the workflow

The workflow detected an error and cannot continue. Called from your workflow code.

ctx.cancel("reason")

CANCELLED

Outside the workflow

An external action (user, admin, API) wants to stop a running workflow.

fail—internal termination

Use fail inside your workflow code when the business logic determines the workflow should stop.

If the workflow has running steps (for example, parallel steps started with execute), fail cancels all of them first, producing cancellation events for each, before publishing the workflow failure event:

// Start two steps in parallel (non-blocking)
var shipping = ctx.execute("shipOrder", ctx.workflowPayload(),
                            ShippingService::shipOrder,
                            Duration.ofMinutes(5), defaults());

var notification = ctx.execute("notifyCustomer", ctx.workflowPayload(),
                                NotificationService::notifyCustomer,
                                Duration.ofMinutes(1), defaults());

// Meanwhile, check stock — if unavailable, fail the workflow
var reserved = ctx.awaitExecute("reserveStock", Boolean.class,
                                 InventoryService::reserveStock);
if (!reserved) {
    ctx.fail(new RuntimeException("Stock unavailable")); (1)
}
1 Cancels shipOrder and notifyCustomer first, then terminates the workflow with FAILED status.
1 cancelled ShipOrderCancelled {"reason": "WorkflowFailedException: Stock unavailable"} 2 cancelled NotifyCustomerCancelled {"reason": "WorkflowFailedException: Stock unavailable"} 3 failed OrderFulfillmentWorkflow#ExecuteFailed {"reason": "RuntimeException: Stock unavailable"}

Running steps are cancelled first—each CANCELLED event carries the reason as a WorkflowFailedException. Then the workflow itself is terminated with a FAILED event that records the original exception (RuntimeException in this case), not the wrapper.

cancel—external termination

Use cancel from outside the workflow—for example, when a user or API wants to stop a running workflow. Like fail, cancel first cancels all running steps, then terminates the workflow.

Given a workflow with parallel steps running:

// Inside the workflow — two long-running steps in parallel
var shipping = ctx.execute("shipOrder", ctx.workflowPayload(),
                            ShippingService::shipOrder,
                            Duration.ofMinutes(5), defaults());

var notification = ctx.execute("notifyCustomer", ctx.workflowPayload(),
                                NotificationService::notifyCustomer,
                                Duration.ofMinutes(1), defaults());

An external caller cancels the workflow while those steps are still running:

var execution = workflowExecutionRepository.findById(workflowId);
execution.cancel("Cancelled by user");
1 cancelled ShipOrderCancelled {"reason": "WorkflowCancelledException: Cancelled by user"} 2 cancelled NotifyCustomerCancelled {"reason": "WorkflowCancelledException: Cancelled by user"} 3 cancelled OrderFulfillmentWorkflow#ExecuteCancelled {"reason": "WorkflowCancelledException: Cancelled by user"}

All running steps are cancelled first, then the workflow itself is terminated with a CANCELLED event (not FAILED).

Normal completion with running steps

The same "cancel first, terminate after" rule applies when the workflow method simply returns while async steps are still running. A step started with execute(…​) and never awaited keeps running in the background. When your workflow body returns, the engine cancels any such step and waits for its CANCELLED event to be written before publishing the CompletedWorkflow event:

public void execute(SimpleWorkflowContext ctx) {
    // fire-and-forget — notice there is no .await() on the returned result
    ctx.execute("sendReceipt", ctx.workflowPayload(),
                NotificationService::sendReceipt,
                Duration.ofMinutes(1), defaults());

    // workflow body returns while sendReceipt is still running
}
1 cancelled SendReceiptCancelled {"reason": "StepCancellationException: Workflow completed"} 2 completed OrderFulfillmentWorkflow#ExecuteCompleted {}

This guarantees that every started step has a terminal event in the log—no silent losses from a step that would have failed after the workflow had already completed.

If you need the workflow to wait for the step’s actual result, call .await() on the returned WorkflowStepResult, use awaitExecute(…​), or combine results with allMatch / anyMatch / noneMatch. See Execute Steps and Step Orchestration.

Unhandled exceptions

If a workflow exits due to an unhandled exception—one where you didn’t explicitly call fail or cancel—it is not in a terminal state. The engine will retry it on the next restart.

Always ensure your workflow has explicit terminal paths—call fail or cancel for every error condition, or let the workflow complete normally.

Lifecycle listeners

You can register listeners that are called when a workflow reaches a specific state. This is useful for cleanup, notifications, or triggering follow-up processes.

Annotation-based listeners

The simplest way is to annotate methods on your workflow class:

public class OrderFulfillmentWorkflow {

    @Workflow(idProperty = "orderId", startOnEvent = "OrderPlaced")
    public void execute(SimpleWorkflowContext ctx) {
        // ... workflow logic
    }

    @OnSuccess (1)
    public void onCompleted(WorkflowStatus status, SimpleWorkflowContext ctx) {
        logger.info("Order {} fulfilled successfully!", ctx.workflowPayload().get("orderId"));
    }

    @OnFailure (2)
    public void onFailed(WorkflowStatus status, SimpleWorkflowContext ctx) {
        logger.warn("Order {} failed: {}", ctx.workflowPayload().get("orderId"), status);
    }

    @OnCancellation (3)
    public void onCancelled(WorkflowStatus status, SimpleWorkflowContext ctx) {
        logger.info("Order {} was cancelled", ctx.workflowPayload().get("orderId"));
    }

    @OnTimeout (4)
    public void onTimedOut(WorkflowStatus status, SimpleWorkflowContext ctx) {
        logger.warn("Order {} timed out", ctx.workflowPayload().get("orderId"));
    }
}
1 Called when the workflow completes successfully (COMPLETED status).
2 Called when the workflow fails (FAILED status).
3 Called when the workflow is cancelled (CANCELLED status).
4 Called when the workflow times out (TIMED_OUT status).

Each listener method receives the WorkflowStatus and the WorkflowContext, giving you access to the workflow payload and ID.

Lifecycle listeners must be defined in the same class as the @Workflow method. Only methods on the workflow instance’s class (and its type hierarchy) are scanned for listener annotations. Use the workflowName attribute (for example, @OnSuccess(workflowName = "myWorkflow")) to bind a listener to a specific workflow when multiple workflows are defined in the same class.

Programmatic listeners

For more control, register listeners via the declarative configuration:

.customized((c, w) -> w
        .registerWorkflowStatusChangeListener(WorkflowStatus.COMPLETED,
                (status, context) -> {
                    logger.info("Workflow {} completed", context.workflowId());
                })
        .registerWorkflowStatusChangeListener(WorkflowStatus.FAILED,
                (status, context) -> {
                    logger.warn("Workflow {} failed", context.workflowId());
                })
)

You can also unregister listeners:

w.unregisterWorkflowStatusChangeListener(WorkflowStatus.COMPLETED, myListener);

Lifecycle listeners are called after the terminal event is published. They run in the context of the workflow execution and have access to the full workflow payload.