Skip to main content

Orchestrator Deployment Guide

The Orchestrator is a separate process that manages Docker containers for deployed applications.

Overview

The Orchestrator handles app lifecycle management:

  • Listens for app.installed events -> Pulls images, creates containers
  • Listens for app.uninstalled events -> Stops and removes containers
  • Monitors container health and restarts as needed
┌─────────────────┐     Events      ┌──────────────────┐
│ AppServer │ ───────────────▶│ Orchestrator │
│ (Main API) │ │ (Docker Mgmt) │
└─────────────────┘ └────────┬─────────┘
│ │
│ │
▼ ▼
┌─────────────────┐ ┌──────────────────┐
│ RabbitMQ │ │ Docker Daemon │
│ (Event Bus) │ │ (/var/run/ │
└─────────────────┘ │ docker.sock) │
└──────────────────┘

When to Use

The Orchestrator is REQUIRED for Docker deployments:

  • Manages Docker containers for deployed applications
  • Handles automatic container lifecycle management
  • Provides isolated runtime environments for apps
  • Responds to app installation/uninstallation events

Requirements

The Orchestrator has strict requirements:

RequirementDescription
RabbitMQREQUIRED - Event bus must be enabled
DockerREQUIRED - Docker daemon access via socket
AppServerMain server must be running and publishing events

The Orchestrator will fail to start if:

- APPSERVER_EVENTBUS_ENABLED is not true
- APPSERVER_DOCKER_ENABLED is not true
- Docker socket is not accessible

Entry Point

Binary: cmd/orchestrator/main.go

Build and run:

# Build
go build -o orchestrator ./cmd/orchestrator

# Run (requires Docker socket access)
./orchestrator

Environment Variables

Required

# Enable orchestration features
APPSERVER_DOCKER_ENABLED=true
APPSERVER_EVENTBUS_ENABLED=true

# RabbitMQ connection
APPSERVER_RABBITMQ_URL=amqp://guest:guest@rabbitmq:5672/

# Docker configuration
APPSERVER_DOCKER_SOCKET_PATH=/var/run/docker.sock
APPSERVER_DOCKER_NETWORK_NAME=appserver-network

Container Registry (for pulling images)

APPSERVER_DOCKER_REGISTRY_URL=registry.eacore6.de
APPSERVER_DOCKER_REGISTRY_USERNAME=your-registry-user
APPSERVER_DOCKER_REGISTRY_PASSWORD=your-registry-password

Resource Limits

APPSERVER_DOCKER_DEFAULT_CPU_SHARES=1024
APPSERVER_DOCKER_DEFAULT_MEMORY_MB=512

Timeouts

APPSERVER_DOCKER_PULL_TIMEOUT=10m
APPSERVER_DOCKER_START_TIMEOUT=2m
APPSERVER_DOCKER_STOP_TIMEOUT=10s

Health Monitoring

APPSERVER_DOCKER_HEALTH_CHECK_INTERVAL=30s
APPSERVER_DOCKER_HEALTH_CHECK_TIMEOUT=5s
APPSERVER_DOCKER_HEALTH_CHECK_RETRIES=3
APPSERVER_DOCKER_MAX_RESTARTS=3

Image Configuration

# Image stage/tag to pull (default: latest)
# Valid values: latest, pre-release, testing
APPSERVER_DOCKER_DEFAULT_IMAGE_STAGE=latest

Startup Reconciliation

# Timeout for startup reconciliation (default: 2m)
APPSERVER_DOCKER_RECONCILE_TIMEOUT=2m

Image Configuration

The orchestrator pulls Docker images from a configurable registry with configurable tags.

Registry URL

Set the container registry URL:

APPSERVER_DOCKER_REGISTRY_URL=registry.eacore6.de

Images are referenced as: {registry_url}/{app_name}:{stage}

Example: registry.eacore6.de/todos:latest

Image Stage/Tag

Control which image tag to pull globally:

# Default: latest
# Options: latest, pre-release, testing
APPSERVER_DOCKER_DEFAULT_IMAGE_STAGE=latest

Per-App Override: Individual apps can override the stage via the Settings API:

mutation {
updateSettings(
appID: "app-uuid"
settings: { "docker.image.stage": "testing" }
) {
success
}
}

This allows testing specific apps with pre-release builds while others remain on stable.

Startup Reconciliation

When the orchestrator starts, it automatically reconciles container state to ensure installed apps are running:

  1. Queries all apps with state INSTALLED
  2. Checks if their containers are running in Docker
  3. Starts any stopped containers
  4. Waits for health checks to pass
  5. Updates deployment state in database

Configuration:

# Max time for startup reconciliation (default: 2m)
APPSERVER_DOCKER_RECONCILE_TIMEOUT=2m

This ensures apps recover automatically after orchestrator restarts without waiting for health monitor timeouts.

RabbitMQ Queue Configuration

The Orchestrator uses a separate queue prefix from the AppServer to ensure reliable event delivery.

Why Separate Queues Matter

RabbitMQ uses round-robin delivery when multiple consumers share the same queue. Without separate queues:

  • Both AppServer and Orchestrator would compete for the same messages
  • The Orchestrator might never receive app.installed events
  • Events could be processed by the wrong service

Queue Naming

ServiceQueue PrefixExample Queue
AppServerappserver.subscriber.appserver.subscriber.app.installed
Orchestratororchestrator.subscriber.orchestrator.subscriber.app.installed

This configuration is set in pkg/v2/orchestrator/services.go:

rabbitmqConfig.SubscriberPrefix = "orchestrator.subscriber."

Verifying Queue Setup

Check that orchestrator queues exist with correct bindings:

# List orchestrator queues
curl -s -u guest:guest "http://localhost:15672/api/queues" | \
jq '.[] | select(.name | contains("orchestrator")) | .name'

# Check bindings for app.installed
curl -s -u guest:guest \
"http://localhost:15672/api/exchanges/%2f/appserver.events/bindings/source" | \
jq '.[] | select(.destination | contains("orchestrator"))'

Expected bindings:

  • orchestrator.subscriber.app.installed bound with routing key app.installed
  • orchestrator.subscriber.app.uninstalled bound with routing key app.uninstalled

Event Flow

Understanding the complete event flow helps with debugging:

┌──────────────┐    1. installApp     ┌──────────────┐
│ Client │ ────────────────────▶│ AppServer │
│ (GraphQL) │ │ │
└──────────────┘ └──────┬───────┘

2. Update DB state
3. Publish event


┌──────────────┐
│ RabbitMQ │
│ │
└──────┬───────┘

4. Route to queue


┌──────────────┐ 6. Start ┌──────────────┐
│ Docker │ ◀───────────────────│ Orchestrator │
│ Container │ │ │
└──────────────┘ └──────────────┘

5. Extract AppName
from payload

Step-by-Step Flow

  1. Client Request: User calls installApp(name: "de.easy-m.statistics") mutation
  2. AppServer Processing:
    • Updates app state to installed in database
    • Publishes app.installed event to RabbitMQ
  3. Event Routing: RabbitMQ routes event to orchestrator.subscriber.app.installed queue
  4. Orchestrator Receives: Event handler extracts AppName from JSON payload
  5. Container Deployment:
    • Pulls Docker image from registry
    • Creates container with environment variables
    • Starts container and waits for health check
  6. App Connects: Container starts, app connects back to AppServer

Event Subscriptions

The Orchestrator subscribes to these events from RabbitMQ:

app.installed

Triggered when an app is installed via the Marketplace:

  1. Resolve app dependencies
  2. Pull Docker image from registry
  3. Create and configure container
  4. Start container
  5. Wait for health check to pass

app.uninstalled

Triggered when an app is uninstalled:

  1. Stop running container
  2. Remove container
  3. Clean up resources

Environment Variables for Containers

When the Orchestrator deploys a container, it passes environment variables from its own environment plus auto-generated app-specific variables.

Environment Variable Mapping

The SDK expects specific variable names. The Orchestrator maps APPSERVER_* variables to the expected names:

Orchestrator EnvContainer EnvDescription
APPSERVER_DB_HOSTDB_HOSTDatabase host
APPSERVER_DB_PORTDB_PORTDatabase port
APPSERVER_DB_NAMEDB_NAMEDatabase name
APPSERVER_DB_USERDB_USERDatabase username
APPSERVER_DB_PASSWORDDB_PASSWORDDatabase password
APPSERVER_DB_SSLMODEDB_SSL_MODEDatabase SSL mode

Auto-Generated Variables

The Orchestrator automatically generates these variables for each container:

# App identity
APP_ID=<uuid> # Unique app identifier
APP_NAME=de.easy-m.statistics # App name

# AppServer connection
APPSERVER_HTTP_URL=http://appserver:8080
APPSERVER_GRPC_ADDRESS=appserver:9091
APPSERVER_GRAPHQL_URL=http://appserver:8080/graphql
APPSERVER_GRAPHQL_HTTP=http://appserver:8080/graphql # SDK expects this

# App upstream URL (where the app should listen)
UPSTREAM_DE_EASY_M_STATISTICS=http://app-de.easy-m.statistics:3000

# Default port
PORT=3000

Production vs Development Mode

The NODE_ENV variable controls production/development behavior:

# In orchestrator environment
NODE_ENV=production # Apps skip asset building, use pre-built assets
NODE_ENV=development # Apps attempt to build assets (may fail in containers)

Important: Set NODE_ENV=production in the orchestrator environment for deployed containers. Development mode attempts to run build tools that aren't available in production Docker images.

Configuring NODE_ENV

Add to your docker-compose.yml orchestrator service:

orchestrator:
environment:
NODE_ENV: production # Critical for production deployments
# ... other variables

Security: Filtered Variables

For security, only specific prefixes are passed to containers:

Allowed Prefixes:

  • APPSERVER_DB_* - Scoped database access
  • APPSERVER_REDIS_* - Scoped Redis access
  • APPSERVER_EVENTBUS_* - Event bus access
  • APPSERVER_HTTP_URL / APPSERVER_GRPC_ADDRESS - AppServer endpoints

Blocked (Never Forwarded):

  • POSTGRES_* - Direct database credentials
  • REDIS_* - Direct Redis credentials
  • KRATOS_* / HYDRA_* - Direct auth service access

This ensures third-party apps cannot access infrastructure directly—they must go through the AppServer's scoped APIs.

Container Lifecycle

┌────────────┐     install     ┌─────────────┐
│ Pending │ ───────────────▶│ Pulling │
└────────────┘ └──────┬──────┘


┌─────────────┐
│ Creating │
└──────┬──────┘


┌────────────┐ unhealthy ┌─────────────┐
│ Restarting │ ◀───────────────│ Running │
└──────┬─────┘ └──────┬──────┘
│ │
│ healthy │ uninstall
└──────────────────────────────┼──────────▶ Stopped


┌─────────────┐
│ Stopped │
└─────────────┘

Docker Deployment

orchestrator:
image: registry.eacore6.de/orchestrator:latest
environment:
- APPSERVER_DOCKER_ENABLED=true
- APPSERVER_EVENTBUS_ENABLED=true
- APPSERVER_RABBITMQ_URL=amqp://guest:guest@rabbitmq:5672/
- APPSERVER_DOCKER_SOCKET_PATH=/var/run/docker.sock
- APPSERVER_DOCKER_NETWORK_NAME=appserver-network
- APPSERVER_DOCKER_REGISTRY_URL=registry.eacore6.de
- APPSERVER_DOCKER_REGISTRY_USERNAME=${REGISTRY_USERNAME}
- APPSERVER_DOCKER_REGISTRY_PASSWORD=${REGISTRY_PASSWORD}
volumes:
# Mount Docker socket for container management
- /var/run/docker.sock:/var/run/docker.sock:rw
depends_on:
rabbitmq:
condition: service_healthy
appserver:
condition: service_healthy
networks:
- appserver-network

Security Considerations

Docker Socket Access

The Orchestrator requires write access to the Docker socket, which grants significant privileges:

  • Can create/stop/remove any container
  • Can pull images from registries
  • Can access container logs and exec

Best Practices:

  • Run the Orchestrator on a dedicated host
  • Use Docker socket proxy for restricted access
  • Monitor Orchestrator logs for anomalies
  • Use read-only volumes where possible

Container Isolation

Deployed app containers are isolated:

  • Each app runs in its own container
  • Containers connect to appserver-network
  • Resource limits prevent runaway usage
  • Health checks ensure containers are responsive

Monitoring

Logs

# View orchestrator logs
docker logs orchestrator -f

# Filter for specific app
docker logs orchestrator | grep "app_name=todos"

Metrics

The Orchestrator exposes metrics for:

  • Container start/stop counts
  • Pull durations
  • Health check results
  • Restart counts

Events

Monitor RabbitMQ queues:

  • app.installed - Installation requests
  • app.uninstalled - Uninstallation requests
  • Dead-letter queue for failed operations

Troubleshooting

Orchestrator Won't Start

Error: EventBus must be enabled

# Ensure RabbitMQ is enabled
APPSERVER_EVENTBUS_ENABLED=true
APPSERVER_RABBITMQ_URL=amqp://guest:guest@rabbitmq:5672/

Error: Docker orchestration must be enabled

# Enable Docker orchestration
APPSERVER_DOCKER_ENABLED=true

Error: Cannot connect to Docker daemon

# Check Docker socket exists and is accessible
ls -la /var/run/docker.sock

# Ensure volume mount is correct
volumes:
- /var/run/docker.sock:/var/run/docker.sock:rw

Container Pull Failures

Error: repository not found

  • Verify image name and tag
  • Check registry credentials
  • Ensure registry is accessible from the host

Error: timeout pulling image

  • Increase APPSERVER_DOCKER_PULL_TIMEOUT
  • Check network connectivity to registry
  • Verify registry is not rate-limiting

Container Won't Start

Error: port already in use

  • Check for conflicting containers
  • Verify port mappings are unique

Error: out of memory

  • Increase APPSERVER_DOCKER_DEFAULT_MEMORY_MB
  • Free up host memory
  • Check for memory leaks in apps

Health Check Failures

Symptom: Container keeps restarting

  • Check app health endpoint is responding
  • Verify health check configuration
  • Increase APPSERVER_DOCKER_HEALTH_CHECK_RETRIES
  • Check container logs for errors:
    docker logs <container_name>

Poisoned Messages / Infinite Retry Loop

Symptom: Orchestrator logs show the same event being processed repeatedly with errors

This can happen when a malformed event is published to RabbitMQ and the orchestrator cannot process it, causing infinite redelivery.

Detection:

# Check queue message stats via RabbitMQ Management API
curl -s -u guest:guest "http://localhost:15672/api/queues/%2f/orchestrator.subscriber.app.installed" | \
jq '{messages: .messages, redeliver: .message_stats.redeliver}'

# High redeliver count indicates poisoned messages
# Example output showing problem: {"messages": 1, "redeliver": 6000000}

Resolution - Purge the queue:

# CAUTION: This deletes ALL messages in the queue
curl -X DELETE -u guest:guest \
"http://localhost:15672/api/queues/%2f/orchestrator.subscriber.app.installed/contents"

Prevention:

  • Ensure events are published with correct payload format
  • The app.installed event must include AppName in the JSON payload:
    {"AppName": "de.easy-m.statistics", "AppID": "uuid-here"}
  • Both AppName (PascalCase) and app_name (snake_case) are supported

Stale Deployments

Symptom: App shows as "installed" in database but container doesn't exist

This can happen if:

  • Container was manually deleted
  • Docker daemon restarted and container was not set to restart
  • Host system rebooted

Detection:

# Check deployment state in database
docker exec -it postgres psql -U partner -d partner -c \
"SELECT a.name, d.state, d.container_id, d.health_status
FROM appserver.deployments d
JOIN appserver.apps a ON d.app_id = a.id
WHERE d.state = 'healthy';"

# Verify containers actually exist
docker ps --filter "name=app-" --format "{{.Names}}"

Auto-Recovery: The orchestrator automatically handles stale deployments:

  1. On startup, it reconciles all installed apps
  2. When processing install events, it verifies container existence
  3. If a deployment shows "healthy" but container doesn't exist, it:
    • Marks the deployment as failed
    • Triggers a fresh deployment

Manual Recovery: If auto-recovery doesn't work:

# 1. Mark deployment as failed in database
docker exec -it postgres psql -U partner -d partner -c \
"UPDATE appserver.deployments SET state = 'failed'
WHERE container_id = 'missing-container-id';"

# 2. Trigger reinstall via GraphQL
curl -X POST http://localhost:8080/graphql \
-H "Content-Type: application/json" \
-H "Cookie: ory_kratos_session=<session>" \
-d '{"query": "mutation { installApp(name: \"de.easy-m.statistics\") { id } }"}'

Event Not Received by Orchestrator

Symptom: installApp mutation succeeds but orchestrator doesn't deploy container

Check 1: Verify AppServer is publishing events

# Check appserver logs for event publishing
docker logs appserver | grep "app.installed"

Check 2: Verify RabbitMQ bindings

# List orchestrator queue bindings
curl -s -u guest:guest \
"http://localhost:15672/api/exchanges/%2f/appserver.events/bindings/source" | \
jq '.[] | select(.destination | contains("orchestrator"))'

# Expected: binding with routing_key "app.installed"

Check 3: Verify queue separation

The orchestrator MUST use a different queue prefix than the appserver. Without this, both services share the same queue and RabbitMQ round-robins messages between them.

# List all subscriber queues
curl -s -u guest:guest "http://localhost:15672/api/queues" | \
jq '.[].name | select(contains("subscriber"))'

# Expected output should include BOTH:
# - "appserver.subscriber.app.installed"
# - "orchestrator.subscriber.app.installed"

If only appserver.subscriber.* queues exist, check pkg/v2/orchestrator/services.go:

rabbitmqConfig.SubscriberPrefix = "orchestrator.subscriber."  // REQUIRED

Check 4: Verify orchestrator is connected

# Check RabbitMQ consumers
curl -s -u guest:guest \
"http://localhost:15672/api/queues/%2f/orchestrator.subscriber.app.installed" | \
jq '.consumer_details'

# Should show at least one consumer

App Container Configuration Errors

Error: database configuration is required

  • The SDK expects DB_HOST, not APPSERVER_DB_HOST
  • The orchestrator should automatically map these variables
  • Verify env_builder.go includes the DB mappings

Error: AppServerURL is required

  • The SDK expects APPSERVER_GRAPHQL_HTTP environment variable
  • Verify this is set in container environment:
    docker inspect app-de.easy-m.statistics | jq '.[0].Config.Env'

Error: NODE_ENV=development causing build failures

  • Production containers should run with NODE_ENV=production
  • Add to orchestrator environment in docker-compose:
    NODE_ENV: production

Scaling Considerations

Multiple Orchestrators

For high availability, you can run multiple Orchestrator instances:

  • Use competing consumers on RabbitMQ
  • Each instance handles different events
  • Ensure container operations are idempotent

Distributed Docker Hosts

For scaling across multiple machines:

  • Use Docker Swarm or Kubernetes instead
  • Or run one Orchestrator per Docker host
  • Coordinate via shared RabbitMQ