Skip to main content

Docker Orchestration

Built-in Docker container lifecycle management for application backend services with event-driven deployment, health monitoring, and automatic rollback.

Scope

This document covers the following packages and their interfaces:

LayerPackagesKey Files
Applicationpkg/v2/application/orchestration/installer.go, uninstaller.go, health_monitor.go, env_builder.go, session.go
Domain Modelspkg/v2/domain/deployment/, pkg/v2/domain/event/deployment.go, orchestration_events.go, container_events.go
Infrastructurepkg/v2/infrastructure/docker/client.go, network.go, volume.go, registry.go
Repositoriespkg/v2/domain/repository/deployment_repository.go, settings_repository.go
Orchestrator Binarycmd/orchestrator/main.go
Orchestrator Servicepkg/v2/orchestrator/orchestrator.go, services.go
Configurationpkg/v2/config/docker.go

Overview

Easy AppServer provides comprehensive Docker orchestration as an optional standalone binary (cmd/orchestrator/) that:

  • Event-Driven: Listens to app.installed and app.uninstalled events via RabbitMQ
  • Stateless: Multiple orchestrator instances can run concurrently for horizontal scaling
  • Reactive: Only performs actions in response to lifecycle events
  • Session Tracking: Smart rollback on installation failures
  • Health Monitoring: Continuous container health checks with automatic restart
  • Dependency-Aware: Deploys apps in correct order based on dependency graph
  • Volume Management: Persistent data across container restarts
  • Network Isolation: Shared Docker network for inter-container communication

Architecture

Layered Structure

Event Bus (RabbitMQ)

Orchestrator Binary (standalone)

Application Layer (installer, uninstaller, health_monitor)

Infrastructure Layer (docker client, network, volume)

Docker Daemon

Core Components

InstallationOrchestrator (pkg/v2/application/orchestration/installer.go:19-34)

type InstallationOrchestrator struct {
deploymentRepo repository.DeploymentRepository
appRepo repository.AppRepository
dependencyRepo repository.DependencyRepository
settingsRepo repository.SettingsRepository
dependencyResolver service.DependencyResolver
dockerClient docker.Client
registry docker.RegistryInterface
networkManager *docker.NetworkManager
volumeManager *docker.VolumeManager
envBuilder *EnvBuilder
eventBus event.Bus
config config.DockerConfig
logger telemetry.Logger
}

Dependencies & Interactions:

  • → Event Bus (RabbitMQ): Subscribes to app.installed/uninstalled events
  • → Deployment Repository: Tracks deployment state in PostgreSQL
  • → App Repository: Fetches app manifests and dependency information
  • → Settings Repository: Loads app-specific environment variable overrides
  • → Dependency Resolver: Resolves transitive dependencies and installation order
  • → Docker Client: Manages containers, images, networks, volumes
  • → Docker Registry: Pulls container images
  • → Network Manager: Creates and manages shared Docker network
  • → Volume Manager: Creates and manages persistent volumes
  • → Env Builder: Assembles environment variables from multiple sources

UninstallationOrchestrator (pkg/v2/application/orchestration/uninstaller.go)

  • Handles graceful container shutdown
  • Checks for healthy dependents before uninstall
  • Cleans up networks and volumes
  • Soft-deletes deployment records

HealthMonitor (pkg/v2/application/orchestration/health_monitor.go)

  • Continuous monitoring loop
  • Automatic restart on container crashes
  • Max restart limit tracking
  • Health event publishing

EnvBuilder (pkg/v2/application/orchestration/env_builder.go:16-25)

type EnvBuilder struct {
baseEnvFile string // Path to .env.example
settingsRepo repository.SettingsRepository
logger telemetry.Logger
baseEnvCache map[string]string // Cached base env vars
appserverHost string // Docker service name
appserverHTTP string // http://appserver:8080
appserverGRPC string // appserver:9090
}

Installation Flow

Based on installer.go:124-300:

Process Steps

StepFunctionFile:LinesDescription
1Resolve transitive dependenciesinstaller.go:126-129BFS to find all dependencies recursively
2Get installation orderinstaller.go:132-135Topological sort ensures dependencies first
3Load app detailsinstaller.go:145-148Fetch app manifests for all apps
4Filter already-deployedinstaller.go:150-175Skip apps with healthy deployments
5Ensure network existsinstaller.go:177-185Create shared Docker network
6Create installation sessioninstaller.go:187-190Track deployments for rollback
7Deploy each appinstaller.go:192-268Deploy in dependency order
8Publish success eventsinstaller.go:270-280Notify completion

Deployment Sub-Steps

For each app in installation order (deployApp function, lines 282-450):

Sub-StepFunctionDescription
1Create deployment recordDatabase tracking with Created state
2Publish image pull startEvent notification
3Pull Docker imageFrom registry with progress tracking
4Update state to PullingImageDatabase state transition
5Publish image pull completeSuccess event
6Build environment variables3-layer precedence (base, overrides, auto-gen)
7Create volumePersistent storage /app/data
8Create containerWith full config (env, labels, resources)
9Update state to CreatingContainerDatabase state transition
10Publish container createdEvent notification
11Connect to networkAttach to shared network
12Start containerDocker container start
13Update state to StartingDatabase state transition
14Publish container startedEvent notification
15Wait for healthCheck container running state
16Mark as healthyUpdate deployment state to Healthy
17Publish container healthySuccess event
18Register in sessionTrack for potential rollback

Installation Session

From session.go:

type InstallationSession struct {
TargetAppID uuid.UUID
TargetAppName string
mu sync.RWMutex // Thread-safe operations
deployments map[uuid.UUID]string // deploymentID -> containerID
appDeployments map[uuid.UUID]uuid.UUID // appID -> deploymentID
}

Purpose:

  • Tracks only deployments created in current installation
  • Enables surgical rollback on failure
  • Preserves already-installed shared dependencies
  • Thread-safe with RWMutex

Rollback Process (lines 85-150):

For each deployment in session (reverse order):
1. Stop container (10s graceful timeout)
2. Remove container (forced if needed)
3. Mark deployment as Failed
4. Publish failure events

Uninstallation Flow

Based on uninstaller.go:60-200:

Process Steps

StepFunctionFile:LinesDescription
1Find deploymentuninstaller.go:70-83Locate active deployment record
2Check dependentsuninstaller.go:85-113Block if healthy apps depend on this
3Publish start eventuninstaller.go:115-122Notify uninstall beginning
4Stop containeruninstaller.go:125-135Graceful stop with SIGTERM, 10s timeout
5Remove containeruninstaller.go:137-147Force remove if needed
6Disconnect networkuninstaller.go:149-159Detach from shared network
7Clean up volumeuninstaller.go:161-171Delete persistent volume
8Update deployment stateuninstaller.go:173-182Transition to Stopped
9Soft deleteuninstaller.go:184-192Set deleted_at timestamp
10Publish complete eventuninstaller.go:194-201Notify success

Safety Guards

Dependent Check (checkForHealthyDependents, lines 203-250):

  • Finds all apps with dependencies on target app
  • Checks if any are currently healthy
  • Returns error if healthy dependents exist
  • Prevents cascade failures

Graceful Shutdown:

  • Sends SIGTERM to allow cleanup
  • Waits up to 10 seconds
  • Forces SIGKILL if timeout exceeded
  • Logs timeout for debugging

Soft Delete:

  • Preserves deployment history
  • Sets deleted_at timestamp
  • Transitions state to Stopped
  • Enables audit trail

Health Monitoring

Based on health_monitor.go:20-250:

Monitoring Loop

HealthMonitor struct (lines 20-29):

type HealthMonitor struct {
deploymentRepo repository.DeploymentRepository
appRepo repository.AppRepository
dockerClient docker.Client
eventBus event.Bus
logger telemetry.Logger
stopChan chan struct{}
}

Monitor Process (Monitor function, lines 50-150):

1. Fetch all non-stopped deployments from database
2. For each deployment:
- Inspect container state via Docker API
- Check running status
- Check health status
3. Handle state transitions:
- Unhealthy → attempt restart
- Not running → attempt restart
- Healthy → publish recovery event
4. Track restart attempts (max 5)
5. Mark as Failed if max restarts exceeded
6. Sleep interval, repeat

Health States

StateDescriptionAction
HealthyContainer running and healthyNo action
UnhealthyContainer running but failing health checksRestart attempt
Not RunningContainer stopped/crashedRestart attempt
FailedMax restarts exceededMark failed, publish event
RecoveredUnhealthy → Healthy transitionPublish recovery event

Restart Policy

Restart Limits (configurable):

  • Max restarts: 5 (default)
  • Counter resets on successful health
  • Exceeded → deployment marked Failed
  • Requires manual intervention after failure

Restart Process:

  1. Increment restart counter
  2. Stop container if running
  3. Start container
  4. Update deployment state
  5. Publish restart event
  6. Health check on next cycle

Environment Variables

Based on env_builder.go:45-250:

Three-Layer Precedence

Layer 1: Base Environment (lowest precedence)

  • Source: .env.example file
  • Filtered by safe prefix whitelist
  • Cached for performance
  • Docker address translation applied

Layer 2: App-Specific Overrides (middle precedence)

  • Source: Settings database
  • Key format: docker.env.{VAR_NAME}
  • Example: docker.env.LOG_LEVEL=debug
  • Loaded per-app from settings repository

Layer 3: Auto-Generated Variables (highest precedence)

// buildAutoGeneratedVars (lines 200-220)
APP_ID: <app UUID>
APP_NAME: <app name>
APPSERVER_HTTP_URL: http://appserver:8080
APPSERVER_GRPC_URL: appserver:9090
APPSERVER_GRAPHQL_URL: http://appserver:8080/graphql

Security Filtering

Safe Prefix Whitelist (shouldIncludeEnvVar, lines 156-180):

APPSERVER_*    // AppServer endpoints
LOG_LEVEL // Logging configuration
NODE_ENV // Node environment
TZ // Timezone

Blocked Prefixes (direct infrastructure access):

POSTGRES_*     // No direct database access
REDIS_* // No direct cache access
RABBITMQ_* // No direct message queue access
KRATOS_* // No direct auth access
HYDRA_* // No direct OAuth access
OPENFGA_* // No direct authz access

Rationale: Apps must access all infrastructure through AppServer proxies for security and monitoring.

Address Translation

Docker Network Name Resolution (transformValue, lines 182-220):

localhost:5432        →  postgres:5432
localhost:6379 → redis:6379
http://localhost:8080 → http://appserver:8080
ws://localhost:8080 → ws://appserver:8080

Network and Volume Management

Shared Network

From pkg/v2/infrastructure/docker/network.go:

Network Configuration:

type NetworkConfig struct {
Name string // e.g., "appserver_network"
Subnet string // e.g., "172.20.0.0/16"
}

NetworkManager (lines 15-25):

  • Idempotent network creation via EnsureNetwork()
  • Bridge driver for container communication
  • Attachable for Docker CLI inspection
  • Shared across all app containers

Service Discovery:

  • Container names resolve as hostnames
  • DNS provided by Docker bridge network
  • Example: http://todos-app:3000 resolves automatically

Volume Management

From pkg/v2/infrastructure/docker/volume.go:

Volume Naming Convention:

app-{appName}-data

Examples:

  • app-todos-app-data
  • app-analytics-data

VolumeManager (lines 12-30):

type VolumeManager struct {
client docker.Client
logger telemetry.Logger
}

// CreateVolume creates a Docker volume
// Mount path: /app/data inside container
// Driver: local
// Labels: Tagged for appserver management

Volume Lifecycle:

  • Created during deployment (deployApp, lines 330-345)
  • Mounted at /app/data in container
  • Persists across container restarts
  • Deleted on app uninstall
  • Idempotent creation (handles existing)

Deployment States

State Machine

From pkg/v2/domain/deployment/deployment.go:

Created

PullingImage
├→ ImagePullFailed (can be retried)

CreatingContainer
├→ ContainerCreated (event published)

Starting
├→ ContainerStarted (event published)

CheckingHealth
├→ Healthy (ready for traffic)
├→ Unhealthy (health monitor restarts)
└→ Failed (max restarts exceeded)

Stopped (via uninstall)

(Soft deleted)

State Transitions:

  • Each transition logged to database
  • Events published for monitoring
  • Rollback possible from any state
  • Soft delete preserves history

Event-Driven Architecture

Events Consumed

// installer.go:71-121
func (i *InstallationOrchestrator) HandleAppInstalled(ctx context.Context, evt event.Event)

// uninstaller.go:40-90
func (u *UninstallationOrchestrator) HandleAppUninstalled(ctx context.Context, evt event.Event)

Event Sources:

  • app.installed - Published by Marketplace service
  • app.uninstalled - Published by Marketplace service

Events Published

Installation Events:

  • orchestration.install.started - Installation begun
  • orchestration.install.progress - Deployment progress updates
  • orchestration.install.completed - All containers healthy
  • orchestration.install.failed - Installation failed, rolled back

Container Lifecycle Events:

  • container.created - Container created
  • container.started - Container started
  • container.healthy - Health check passed
  • container.unhealthy - Health check failed
  • container.failed - Max restarts exceeded
  • container.restarted - Container restarted by health monitor

Image Events:

  • image.pull.started - Image pull begun
  • image.pull.completed - Image pull successful
  • image.pull.failed - Image pull failed

Uninstallation Events:

  • orchestration.uninstall.started - Uninstall begun
  • orchestration.uninstall.completed - Container removed
  • orchestration.uninstall.failed - Uninstall failed

Container Configuration

Deployment Specification

From installer.go:350-420:

Image Reference:

{registry_url}/{app_name}:{stage}

Example:

registry.example.com/todos-app:production

Container Config:

container.Config{
Image: imageRef,
Env: envVarsList, // From EnvBuilder
Labels: map[string]string{
"app.id": appID.String(),
"deployment.id": deploymentID.String(),
"app.name": appName,
"version": app.Version,
"managed-by": "appserver",
},
}

Host Config:

container.HostConfig{
RestartPolicy: container.RestartPolicy{
Name: "unless-stopped",
},
Resources: container.Resources{
Memory: config.MemoryLimit,
CPUShares: config.CPUShares,
},
Binds: []string{
volumeName + ":/app/data",
},
}

Rollback and Error Handling

Failure Points

1. Image Pull Failure:

  • State: ImagePullFailed
  • Action: Publish failure event
  • Recovery: Retryable manually

2. Container Creation Failure:

  • Trigger: Session rollback
  • Action: Remove all session containers
  • Preserve: Existing dependencies

3. Container Start Failure:

  • Trigger: Session rollback
  • Action: Stop and remove containers
  • State: Mark deployment Failed

4. Health Check Failure:

  • State: Unhealthy
  • Action: Health monitor restart
  • Limit: Max 5 attempts

5. Dependency Resolution Failure:

  • Early abort
  • No rollback needed
  • Nothing created yet

Rollback Process

From session.go:85-150:

func (s *InstallationSession) Rollback(
ctx context.Context,
dockerClient docker.Client,
deploymentRepo repository.DeploymentRepository,
logger telemetry.Logger,
) error {
// Iterate deployments in reverse order
for deploymentID, containerID := range s.deployments {
// 1. Stop container (10s timeout)
if err := dockerClient.StopContainer(ctx, containerID, stopTimeout); err != nil {
logger.Warn("failed to stop container during rollback", ...)
}

// 2. Remove container (forced)
if err := dockerClient.RemoveContainer(ctx, containerID, true); err != nil {
logger.Warn("failed to remove container during rollback", ...)
}

// 3. Mark deployment as Failed
deployment, _ := deploymentRepo.FindByID(ctx, deploymentID)
deployment.State = deployment.StateFailed
deploymentRepo.Update(ctx, deployment)

// 4. Publish failure events
failEvt := event.NewContainerFailed(...)
eventBus.Publish(ctx, failEvt)
}
}

Standalone Binary

Orchestrator Service

From pkg/v2/orchestrator/orchestrator.go:

Orchestrator struct (lines 20-35):

type Orchestrator struct {
installer *orchestration.InstallationOrchestrator
uninstaller *orchestration.UninstallationOrchestrator
healthMonitor *orchestration.HealthMonitor
eventBus event.Bus
logger telemetry.Logger
stopChan chan struct{}
}

Lifecycle:

// Start() - lines 60-120
1. Subscribe to app.installed event
2. Subscribe to app.uninstalled event
3. Start health monitor in goroutine
4. Block waiting for stop signal

// Stop() - lines 122-145
1. Close stop channel
2. Unsubscribe from events
3. Stop health monitor
4. Close connections

Entry Point

From cmd/orchestrator/main.go:

func main() {
// 1. Load configuration from environment
cfg := config.LoadFromEnv()

// 2. Validate Docker is enabled
if !cfg.Docker.Enabled {
log.Fatal("Docker orchestration disabled")
}

// 3. Initialize telemetry
logger := telemetry.NewLogger(cfg.Telemetry)

// 4. Create orchestrator instance
orch, err := orchestrator.New(cfg, logger)
if err != nil {
log.Fatal("Failed to create orchestrator", err)
}

// 5. Setup graceful shutdown
signalChan := make(chan os.Signal, 1)
signal.Notify(signalChan, os.Interrupt, syscall.SIGTERM)

// 6. Start orchestrator (blocks)
if err := orch.Start(ctx); err != nil {
log.Fatal("Failed to start orchestrator", err)
}

// 7. Wait for shutdown signal
<-signalChan

// 8. Stop orchestrator gracefully
orch.Stop()
}

Configuration

From pkg/v2/config/docker.go:

type DockerConfig struct {
Enabled bool // APPSERVER_DOCKER_ENABLED
SocketPath string // DOCKER_SOCKET_PATH
NetworkName string // DOCKER_NETWORK_NAME
NetworkSubnet string // DOCKER_NETWORK_SUBNET
RegistryURL string // DOCKER_REGISTRY_URL
RegistryUsername string // DOCKER_REGISTRY_USERNAME
RegistryPassword string // DOCKER_REGISTRY_PASSWORD
RegistryStage string // DOCKER_REGISTRY_STAGE (e.g., "production")
MemoryLimit int64 // Default container memory limit
CPUShares int64 // Default CPU shares
MaxRestarts int // Health monitor restart limit
StopTimeout int // Graceful stop timeout (seconds)
HealthCheckInterval int // Health check interval (seconds)
}

Environment Variables:

# Core
APPSERVER_DOCKER_ENABLED=true

# Docker Connection
DOCKER_SOCKET_PATH=/var/run/docker.sock

# Network
DOCKER_NETWORK_NAME=appserver_network
DOCKER_NETWORK_SUBNET=172.20.0.0/16

# Registry
DOCKER_REGISTRY_URL=registry.example.com
DOCKER_REGISTRY_USERNAME=user
DOCKER_REGISTRY_PASSWORD=secret
DOCKER_REGISTRY_STAGE=production

# Resources
DOCKER_MEMORY_LIMIT=536870912 # 512MB
DOCKER_CPU_SHARES=1024

# Health Monitoring
DOCKER_MAX_RESTARTS=5
DOCKER_STOP_TIMEOUT=10
DOCKER_HEALTH_CHECK_INTERVAL=30

Concurrency Patterns

Thread Safety

InstallationSession:

type InstallationSession struct {
mu sync.RWMutex // Protects deployments maps
deployments map[uuid.UUID]string
appDeployments map[uuid.UUID]uuid.UUID
}
  • RWMutex for concurrent access
  • Write lock for registration
  • Read lock for rollback

HealthMonitor:

  • Single-threaded monitoring loop
  • Goroutine per orchestrator instance
  • Stop channel for graceful shutdown

Stateless Design:

  • No shared mutable state across requests
  • Each installation gets unique session
  • Thread-safe through database locking

Horizontal Scalability

Multiple Orchestrator Instances:

  • Stateless service design
  • Event queue provides work distribution
  • Database provides coordination
  • No leader election needed

Database Connection Pooling:

  • Max connections: 25 per instance
  • Idle connections: 5
  • Connection reuse for performance

Security Considerations

Environment Variable Filtering

Prevents:

  • Direct database access
  • Direct cache access
  • Direct message queue access
  • Privilege escalation through credentials

Enforces:

  • All external service access through AppServer
  • Monitoring and logging at AppServer layer
  • Fine-grained permission checks

Container Isolation

Resource Limits:

  • Memory limits prevent OOM attacks
  • CPU shares prevent CPU starvation
  • Prevents single container from exhausting host

Network Isolation:

  • Shared network for inter-container communication
  • No host network access
  • No privileged containers

Volume Isolation:

  • App-specific volumes only
  • No host path mounts
  • No privileged file access

Event-Based Authorization

Security Properties:

  • Only responds to authenticated events
  • Deployment records tied to app IDs
  • State changes audited through events
  • No direct API access (event-driven only)

Performance Characteristics

Scalability

Horizontal Scaling:

  • Stateless orchestrator instances
  • RabbitMQ provides work queue
  • Database connection pooling
  • No coordination overhead

Deployment Parallelism:

  • Within dependency constraints
  • Parallel deployment of independent apps
  • Sequential deployment of dependent apps

Latency

Typical Deployment:

  • Image pull: 5-30 seconds (varies by size)
  • Container creation: <1 second
  • Container start: <1 second
  • Health check: 5-30 seconds (configurable)
  • Total: 10-60 seconds typical

Health Check Frequency:

  • Configurable interval (default: 30s)
  • Runs continuously in background
  • Low overhead (inspect API calls)

Code Reference Table

ComponentFileLinesTests/VerificationDescription
InstallationOrchestratororchestration/installer.go19-34integration/orchestration_e2e_test.go:50-150Main installation coordinator
HandleAppInstalledorchestration/installer.go71-121Event handler integration testEvent handler entry point
Installorchestration/installer.go124-280Full installation flow test10-step installation workflow
deployApporchestration/installer.go282-450Unit testedSingle app deployment
UninstallationOrchestratororchestration/uninstaller.go20-35integration/orchestration_e2e_test.go:200-280Handles graceful uninstall
HandleAppUninstalledorchestration/uninstaller.go40-90Event handler testEvent handler entry point
Uninstallorchestration/uninstaller.go92-200Full uninstall flow test10-step uninstall workflow
checkForHealthyDependentsorchestration/uninstaller.go203-250Dependent guard testSafety check
HealthMonitororchestration/health_monitor.go20-29integration/health_monitor_test.go:30-120Continuous monitoring
Monitororchestration/health_monitor.go50-150Health check loop testMain monitoring loop
handleUnhealthyDeploymentorchestration/health_monitor.go152-230Restart policy testRestart logic
EnvBuilderorchestration/env_builder.go16-25orchestration/env_builder_test.go:20-200Environment assembly
Buildorchestration/env_builder.go45-1053-layer precedence testMain build function
shouldIncludeEnvVarorchestration/env_builder.go156-180Security filtering testWhitelist filter
transformValueorchestration/env_builder.go182-220Address translation testDocker name resolution
InstallationSessionorchestration/session.go12-35Session rollback testTracks deployments
Rollbackorchestration/session.go85-150Rollback flow testCleanup on failure
NetworkManagerinfrastructure/docker/network.go15-80Unit testedNetwork operations
VolumeManagerinfrastructure/docker/volume.go12-90Unit testedVolume operations
Docker Clientinfrastructure/docker/client.go20-500Integration testedDocker API wrapper
Orchestrator Serviceorchestrator/orchestrator.go20-145integration/orchestrator_test.goService coordinator
Main Entrycmd/orchestrator/main.go1-150Manual testingStandalone binary