Troubleshooting

This page provides solutions to common issues when running Axon Server Proxy.

Startup issues

Proxy fails to start

Symptom: Proxy exits immediately or fails to start.

Common Causes:

  1. Missing required configuration

    Error: Binding validation failed for configuration properties

    Solution: Ensure proxy.servers and proxy.port are configured in proxy.properties or via environment variables.

  2. Port already in use

    Error: Address already in use

    Solution: Check if another process is using the port:

    # Linux/macOS
    lsof -i :8124
    
    # Windows
    netstat -ano | findstr :8124

    Either stop the other process or change proxy.port.

  3. Invalid TLS configuration

    Error: Failed to load private key or certificate

    Solution: Verify TLS files exist and are readable:

    ls -l /path/to/tls/server-key.pem
    ls -l /path/to/tls/server-cert.pem
  4. Cannot reach Axon Server

    WARN: Failed to connect to backend server

    Solution: Verify Axon Server is running and reachable:

    nc -zv axonserver.local 8124

Connection issues

Applications cannot connect to proxy

Symptom: Applications fail to connect or timeout.

Diagnostic Steps:

  1. Verify proxy is listening

    netstat -an | grep 8124
  2. Test gRPC connectivity

    # Using grpcurl
    grpcurl -plaintext localhost:8124 list
    
    # Basic TCP test
    telnet localhost 8124
  3. Check firewall rules

    # Linux firewall
    sudo iptables -L -n | grep 8124
    
    # Check if SELinux is blocking
    sudo ausearch -m avc -ts recent
  4. Verify application configuration

    Application should connect to proxy address:

    axon.axonserver.servers=proxy.local:8124

TLS handshake failures

Symptom: Connection fails with TLS/SSL errors.

io.grpc.StatusRuntimeException: UNAVAILABLE:
  io exception: javax.net.ssl.SSLHandshakeException

Solutions:

  1. Certificate mismatch

    Verify the certificate matches the hostname:

    openssl x509 -in server-cert.pem -text -noout | grep DNS
  2. Expired certificate

    openssl x509 -in server-cert.pem -noout -dates
  3. Client doesn’t trust certificate

    Application must trust the CA that signed the proxy’s certificate.

    For testing, try with verification disabled (not for production):

    axon.axonserver.ssl-enabled=true
    axon.axonserver.cert-file=  # empty disables verification (insecure!)
  4. Wrong TLS configuration

    Ensure proxy TLS is enabled if applications expect TLS:

    proxy.tlsEnabled=true
    proxy.tlsKey=/path/to/key.pem
    proxy.tlsCert=/path/to/cert.pem

Proxy cannot connect to Axon Server

Symptom: Proxy starts but cannot reach backend.

WARN: Connection to backend server failed: UNAVAILABLE

Solutions:

  1. DNS resolution failure

    # Test DNS
    nslookup axonserver.local
    dig axonserver.local
    
    # Try IP address instead
    proxy.servers=192.168.1.10:8124
  2. Network routing issues

    # Test connectivity
    ping axonserver.local
    traceroute axonserver.local
  3. Backend TLS misconfiguration

    If Axon Server uses TLS:

    proxy.internalTlsEnabled=true
    proxy.internalTrustCerts=/path/to/ca-cert.pem
  4. Wrong port

    Ensure you’re connecting to the admin port (default 8124), not the HTTP port.

Performance issues

High latency

Symptom: Slow response times through the proxy.

Diagnostic:

  1. Check proxy metrics

    curl http://localhost:8080/prometheus | grep grpc
  2. Compare direct vs proxy latency

    Test if latency is from proxy or backend:

    • Connect application directly to Axon Server

    • Measure latency difference

Solutions:

  1. Network latency

    Ensure proxy is network-close to both applications and Axon Server.

  2. Resource constraints

    Check CPU and memory via Prometheus metrics:

    curl http://localhost:8080/prometheus | grep -E 'system_cpu_usage|jvm_memory_used_bytes'

    For JAR deployments, increase JVM heap if needed:

    java -Xmx2g -jar axon-server-proxy.jar
    JVM heap settings do not apply to the native binary.
  3. Too many connections

    Check connection count:

    curl http://localhost:8080/connections

    Consider running multiple proxy instances with load balancing.

High memory usage

Symptom: Proxy consumes excessive memory.

Diagnostic:

# Check heap usage via Prometheus
curl http://localhost:8080/prometheus | grep jvm_memory_used_bytes

# Take heap dump (JAR only, requires JVM tools)
jmap -dump:live,format=b,file=heap.bin <pid>

Solutions:

  1. Too many buffered messages

    Check maxMessageSize configuration. Large messages require more memory.

  2. Connection leak

    Monitor connection count over time. Should be stable.

    watch -n 5 'curl -s http://localhost:8080/connections | jq length'
  3. Increase heap size (JAR only)

    java -Xmx2g -Xms1g -jar axon-server-proxy.jar
    This does not apply to the native binary.

Message delivery issues

Messages not being delivered

Symptom: Commands/queries/events don’t reach their destination.

Diagnostic:

  1. Enable debug logging

    Add a logback.xml to the working directory with DEBUG level for the proxy package, then restart:

    <configuration>
      <appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
          <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} %-5level [%thread] %logger{36} - %msg%n</pattern>
        </encoder>
      </appender>
      <logger name="io.axoniq.proxy" level="DEBUG"/>
      <root level="INFO">
        <appender-ref ref="CONSOLE"/>
      </root>
    </configuration>
  2. Verify connections exist

    curl http://localhost:8080/connections
  3. Check Prometheus metrics for errors

    curl http://localhost:8080/prometheus | grep grpc

Solutions:

  1. Message too large

    Error: RESOURCE_EXHAUSTED: message exceeds maximum size

    Increase message size limit:

    proxy.maxMessageSize=20MB

    Also check Axon Server’s limit.

  2. Flow control issues

    Check for back-pressure in logs. Might indicate slow consumers.

  3. Context mismatch

    Verify application and Axon Server use the same context name.

Duplicate message delivery

Symptom: Same message delivered multiple times.

This is typically an application-level issue, not proxy-related. The proxy forwards messages transparently and doesn’t introduce duplicates.

Check:

  • Event handler idempotency

  • Command handler behavior

  • Axon Server event processor state

Operational issues

Proxy won’t stop gracefully

Symptom: Proxy hangs during shutdown.

Solution:

  1. Wait for timeout

    The proxy.disconnectTimeout controls graceful shutdown (value in seconds):

    proxy.disconnectTimeout=30
  2. Force shutdown

    kill -9 <pid>
  3. Reduce timeout

    For faster shutdowns (in non-production):

    proxy.disconnectTimeout=5

Health check fails

Symptom: /health returns 503.

Check:

  1. Readiness state

    curl http://localhost:8080/health/readiness
  2. Individual components

    curl http://localhost:8080/health | jq '.components'

Common causes:

  • Unable to reach backend Axon Server

  • Application still starting up

Metrics not available

Symptom: /prometheus returns 404 or monitoring endpoint is unreachable.

Solution:

Verify the monitoring endpoint is enabled and check the configured port:

proxy.monitoring.enabled=true
proxy.monitoring.port=8080

If proxy.monitoring.host is set to 127.0.0.1, the endpoint is only reachable from the local machine.

Debugging

Enable debug logging

Add a logback.xml to the working directory (or specify via -Dlogback.configurationFile) and restart the proxy:

<configuration>
  <appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
    <encoder>
      <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} %-5level [%thread] %logger{36} - %msg%n</pattern>
    </encoder>
  </appender>
  <logger name="io.axoniq.proxy" level="DEBUG"/>
  <logger name="io.grpc" level="DEBUG"/>
  <root level="INFO">
    <appender-ref ref="CONSOLE"/>
  </root>
</configuration>

Capture network traffic

For deep debugging, capture gRPC traffic:

# Using tcpdump
sudo tcpdump -i any -w capture.pcap port 8124

# Analyze with Wireshark (supports gRPC decoding)
wireshark capture.pcap

Thread dumps

If proxy appears hung:

# Get thread dump (JAR only)
jstack <pid> > thread-dump.txt

# Or multiple dumps over time
for i in {1..5}; do
  jstack <pid> > thread-dump-$i.txt
  sleep 5
done

Heap dumps

For memory issues (JAR only):

# Generate heap dump
jmap -dump:live,format=b,file=heap.bin <pid>

# Analyze with tools like Eclipse MAT or VisualVM

Common error messages

Error Meaning Solution

UNAVAILABLE: io exception

Network connectivity problem

Check network, DNS, firewall

RESOURCE_EXHAUSTED

Message too large or rate limited

Increase maxMessageSize

PERMISSION_DENIED

Authentication/authorization failure

Check Axon Server access control

DEADLINE_EXCEEDED

Operation timeout

Check network latency, backend performance

FAILED_PRECONDITION

Invalid request state

Check application logic

Address already in use

Port conflict

Change port or stop conflicting process

Getting help

If issues persist:

  1. Check logs with DEBUG level enabled

  2. Collect diagnostics:

    • Configuration file

    • Health check output (/health)

    • Prometheus metrics snapshot (/prometheus)

    • Recent logs

    • Thread dump (if hung, JAR only)

  3. Check Axon Server status: Proxy issues are sometimes backend issues

  4. Community support: https://discuss.axoniq.io

  5. Enterprise support: Contact AxonIQ support with diagnostics

Prevention best practices

  • Monitor health and metrics continuously

  • Set up alerting for anomalies

  • Use proper resource limits (memory, CPU)

  • Keep TLS certificates current

  • Test configuration changes in non-production first

  • Document your specific deployment setup

  • Keep logs for historical troubleshooting