Docker Orchestration
Built-in Docker container lifecycle management for application backend services with event-driven deployment, health monitoring, and automatic rollback.
Scope
This document covers the following packages and their interfaces:
| Layer | Packages | Key Files |
|---|---|---|
| Application | pkg/v2/application/orchestration/ | installer.go, uninstaller.go, health_monitor.go, env_builder.go, session.go |
| Domain Models | pkg/v2/domain/deployment/, pkg/v2/domain/event/ | deployment.go, orchestration_events.go, container_events.go |
| Infrastructure | pkg/v2/infrastructure/docker/ | client.go, network.go, volume.go, registry.go |
| Repositories | pkg/v2/domain/repository/ | deployment_repository.go, settings_repository.go |
| Orchestrator Binary | cmd/orchestrator/ | main.go |
| Orchestrator Service | pkg/v2/orchestrator/ | orchestrator.go, services.go |
| Configuration | pkg/v2/config/ | docker.go |
Overview
Easy AppServer provides comprehensive Docker orchestration as an optional standalone binary (cmd/orchestrator/) that:
- Event-Driven: Listens to
app.installedandapp.uninstalledevents via RabbitMQ - Stateless: Multiple orchestrator instances can run concurrently for horizontal scaling
- Reactive: Only performs actions in response to lifecycle events
- Session Tracking: Smart rollback on installation failures
- Health Monitoring: Continuous container health checks with automatic restart
- Dependency-Aware: Deploys apps in correct order based on dependency graph
- Volume Management: Persistent data across container restarts
- Network Isolation: Shared Docker network for inter-container communication
Architecture
Layered Structure
Event Bus (RabbitMQ)
↓
Orchestrator Binary (standalone)
↓
Application Layer (installer, uninstaller, health_monitor)
↓
Infrastructure Layer (docker client, network, volume)
↓
Docker Daemon
Core Components
InstallationOrchestrator (pkg/v2/application/orchestration/installer.go:19-34)
type InstallationOrchestrator struct {
deploymentRepo repository.DeploymentRepository
appRepo repository.AppRepository
dependencyRepo repository.DependencyRepository
settingsRepo repository.SettingsRepository
dependencyResolver service.DependencyResolver
dockerClient docker.Client
registry docker.RegistryInterface
networkManager *docker.NetworkManager
volumeManager *docker.VolumeManager
envBuilder *EnvBuilder
eventBus event.Bus
config config.DockerConfig
logger telemetry.Logger
}
Dependencies & Interactions:
- → Event Bus (RabbitMQ): Subscribes to app.installed/uninstalled events
- → Deployment Repository: Tracks deployment state in PostgreSQL
- → App Repository: Fetches app manifests and dependency information
- → Settings Repository: Loads app-specific environment variable overrides
- → Dependency Resolver: Resolves transitive dependencies and installation order
- → Docker Client: Manages containers, images, networks, volumes
- → Docker Registry: Pulls container images
- → Network Manager: Creates and manages shared Docker network
- → Volume Manager: Creates and manages persistent volumes
- → Env Builder: Assembles environment variables from multiple sources
UninstallationOrchestrator (pkg/v2/application/orchestration/uninstaller.go)
- Handles graceful container shutdown
- Checks for healthy dependents before uninstall
- Cleans up networks and volumes
- Soft-deletes deployment records
HealthMonitor (pkg/v2/application/orchestration/health_monitor.go)
- Continuous monitoring loop
- Automatic restart on container crashes
- Max restart limit tracking
- Health event publishing
EnvBuilder (pkg/v2/application/orchestration/env_builder.go:16-25)
type EnvBuilder struct {
baseEnvFile string // Path to .env.example
settingsRepo repository.SettingsRepository
logger telemetry.Logger
baseEnvCache map[string]string // Cached base env vars
appserverHost string // Docker service name
appserverHTTP string // http://appserver:8080
appserverGRPC string // appserver:9090
}
Installation Flow
Based on installer.go:124-300:
Process Steps
| Step | Function | File:Lines | Description |
|---|---|---|---|
| 1 | Resolve transitive dependencies | installer.go:126-129 | BFS to find all dependencies recursively |
| 2 | Get installation order | installer.go:132-135 | Topological sort ensures dependencies first |
| 3 | Load app details | installer.go:145-148 | Fetch app manifests for all apps |
| 4 | Filter already-deployed | installer.go:150-175 | Skip apps with healthy deployments |
| 5 | Ensure network exists | installer.go:177-185 | Create shared Docker network |
| 6 | Create installation session | installer.go:187-190 | Track deployments for rollback |
| 7 | Deploy each app | installer.go:192-268 | Deploy in dependency order |
| 8 | Publish success events | installer.go:270-280 | Notify completion |
Deployment Sub-Steps
For each app in installation order (deployApp function, lines 282-450):
| Sub-Step | Function | Description |
|---|---|---|
| 1 | Create deployment record | Database tracking with Created state |
| 2 | Publish image pull start | Event notification |
| 3 | Pull Docker image | From registry with progress tracking |
| 4 | Update state to PullingImage | Database state transition |
| 5 | Publish image pull complete | Success event |
| 6 | Build environment variables | 3-layer precedence (base, overrides, auto-gen) |
| 7 | Create volume | Persistent storage /app/data |
| 8 | Create container | With full config (env, labels, resources) |
| 9 | Update state to CreatingContainer | Database state transition |
| 10 | Publish container created | Event notification |
| 11 | Connect to network | Attach to shared network |
| 12 | Start container | Docker container start |
| 13 | Update state to Starting | Database state transition |
| 14 | Publish container started | Event notification |
| 15 | Wait for health | Check container running state |
| 16 | Mark as healthy | Update deployment state to Healthy |
| 17 | Publish container healthy | Success event |
| 18 | Register in session | Track for potential rollback |
Installation Session
From session.go:
type InstallationSession struct {
TargetAppID uuid.UUID
TargetAppName string
mu sync.RWMutex // Thread-safe operations
deployments map[uuid.UUID]string // deploymentID -> containerID
appDeployments map[uuid.UUID]uuid.UUID // appID -> deploymentID
}
Purpose:
- Tracks only deployments created in current installation
- Enables surgical rollback on failure
- Preserves already-installed shared dependencies
- Thread-safe with RWMutex
Rollback Process (lines 85-150):
For each deployment in session (reverse order):
1. Stop container (10s graceful timeout)
2. Remove container (forced if needed)
3. Mark deployment as Failed
4. Publish failure events
Uninstallation Flow
Based on uninstaller.go:60-200:
Process Steps
| Step | Function | File:Lines | Description |
|---|---|---|---|
| 1 | Find deployment | uninstaller.go:70-83 | Locate active deployment record |
| 2 | Check dependents | uninstaller.go:85-113 | Block if healthy apps depend on this |
| 3 | Publish start event | uninstaller.go:115-122 | Notify uninstall beginning |
| 4 | Stop container | uninstaller.go:125-135 | Graceful stop with SIGTERM, 10s timeout |
| 5 | Remove container | uninstaller.go:137-147 | Force remove if needed |
| 6 | Disconnect network | uninstaller.go:149-159 | Detach from shared network |
| 7 | Clean up volume | uninstaller.go:161-171 | Delete persistent volume |
| 8 | Update deployment state | uninstaller.go:173-182 | Transition to Stopped |
| 9 | Soft delete | uninstaller.go:184-192 | Set deleted_at timestamp |
| 10 | Publish complete event | uninstaller.go:194-201 | Notify success |
Safety Guards
Dependent Check (checkForHealthyDependents, lines 203-250):
- Finds all apps with dependencies on target app
- Checks if any are currently healthy
- Returns error if healthy dependents exist
- Prevents cascade failures
Graceful Shutdown:
- Sends SIGTERM to allow cleanup
- Waits up to 10 seconds
- Forces SIGKILL if timeout exceeded
- Logs timeout for debugging
Soft Delete:
- Preserves deployment history
- Sets
deleted_attimestamp - Transitions state to
Stopped - Enables audit trail
Health Monitoring
Based on health_monitor.go:20-250:
Monitoring Loop
HealthMonitor struct (lines 20-29):
type HealthMonitor struct {
deploymentRepo repository.DeploymentRepository
appRepo repository.AppRepository
dockerClient docker.Client
eventBus event.Bus
logger telemetry.Logger
stopChan chan struct{}
}
Monitor Process (Monitor function, lines 50-150):
1. Fetch all non-stopped deployments from database
2. For each deployment:
- Inspect container state via Docker API
- Check running status
- Check health status
3. Handle state transitions:
- Unhealthy → attempt restart
- Not running → attempt restart
- Healthy → publish recovery event
4. Track restart attempts (max 5)
5. Mark as Failed if max restarts exceeded
6. Sleep interval, repeat
Health States
| State | Description | Action |
|---|---|---|
| Healthy | Container running and healthy | No action |
| Unhealthy | Container running but failing health checks | Restart attempt |
| Not Running | Container stopped/crashed | Restart attempt |
| Failed | Max restarts exceeded | Mark failed, publish event |
| Recovered | Unhealthy → Healthy transition | Publish recovery event |
Restart Policy
Restart Limits (configurable):
- Max restarts: 5 (default)
- Counter resets on successful health
- Exceeded → deployment marked Failed
- Requires manual intervention after failure
Restart Process:
- Increment restart counter
- Stop container if running
- Start container
- Update deployment state
- Publish restart event
- Health check on next cycle
Environment Variables
Based on env_builder.go:45-250:
Three-Layer Precedence
Layer 1: Base Environment (lowest precedence)
- Source:
.env.examplefile - Filtered by safe prefix whitelist
- Cached for performance
- Docker address translation applied
Layer 2: App-Specific Overrides (middle precedence)
- Source: Settings database
- Key format:
docker.env.{VAR_NAME} - Example:
docker.env.LOG_LEVEL=debug - Loaded per-app from settings repository
Layer 3: Auto-Generated Variables (highest precedence)
// buildAutoGeneratedVars (lines 200-220)
APP_ID: <app UUID>
APP_NAME: <app name>
APPSERVER_HTTP_URL: http://appserver:8080
APPSERVER_GRPC_URL: appserver:9090
APPSERVER_GRAPHQL_URL: http://appserver:8080/graphql
Security Filtering
Safe Prefix Whitelist (shouldIncludeEnvVar, lines 156-180):
APPSERVER_* // AppServer endpoints
LOG_LEVEL // Logging configuration
NODE_ENV // Node environment
TZ // Timezone
Blocked Prefixes (direct infrastructure access):
POSTGRES_* // No direct database access
REDIS_* // No direct cache access
RABBITMQ_* // No direct message queue access
KRATOS_* // No direct auth access
HYDRA_* // No direct OAuth access
OPENFGA_* // No direct authz access
Rationale: Apps must access all infrastructure through AppServer proxies for security and monitoring.
Address Translation
Docker Network Name Resolution (transformValue, lines 182-220):
localhost:5432 → postgres:5432
localhost:6379 → redis:6379
http://localhost:8080 → http://appserver:8080
ws://localhost:8080 → ws://appserver:8080
Network and Volume Management
Shared Network
From pkg/v2/infrastructure/docker/network.go:
Network Configuration:
type NetworkConfig struct {
Name string // e.g., "appserver_network"
Subnet string // e.g., "172.20.0.0/16"
}
NetworkManager (lines 15-25):
- Idempotent network creation via
EnsureNetwork() - Bridge driver for container communication
- Attachable for Docker CLI inspection
- Shared across all app containers
Service Discovery:
- Container names resolve as hostnames
- DNS provided by Docker bridge network
- Example:
http://todos-app:3000resolves automatically
Volume Management
From pkg/v2/infrastructure/docker/volume.go:
Volume Naming Convention:
app-{appName}-data
Examples:
app-todos-app-dataapp-analytics-data
VolumeManager (lines 12-30):
type VolumeManager struct {
client docker.Client
logger telemetry.Logger
}
// CreateVolume creates a Docker volume
// Mount path: /app/data inside container
// Driver: local
// Labels: Tagged for appserver management
Volume Lifecycle:
- Created during deployment (
deployApp, lines 330-345) - Mounted at
/app/datain container - Persists across container restarts
- Deleted on app uninstall
- Idempotent creation (handles existing)
Deployment States
State Machine
From pkg/v2/domain/deployment/deployment.go:
Created
↓
PullingImage
├→ ImagePullFailed (can be retried)
↓
CreatingContainer
├→ ContainerCreated (event published)
↓
Starting
├→ ContainerStarted (event published)
↓
CheckingHealth
├→ Healthy (ready for traffic)
├→ Unhealthy (health monitor restarts)
└→ Failed (max restarts exceeded)
↓
Stopped (via uninstall)
↓
(Soft deleted)
State Transitions:
- Each transition logged to database
- Events published for monitoring
- Rollback possible from any state
- Soft delete preserves history
Event-Driven Architecture
Events Consumed
// installer.go:71-121
func (i *InstallationOrchestrator) HandleAppInstalled(ctx context.Context, evt event.Event)
// uninstaller.go:40-90
func (u *UninstallationOrchestrator) HandleAppUninstalled(ctx context.Context, evt event.Event)
Event Sources:
app.installed- Published by Marketplace serviceapp.uninstalled- Published by Marketplace service
Events Published
Installation Events:
orchestration.install.started- Installation begunorchestration.install.progress- Deployment progress updatesorchestration.install.completed- All containers healthyorchestration.install.failed- Installation failed, rolled back
Container Lifecycle Events:
container.created- Container createdcontainer.started- Container startedcontainer.healthy- Health check passedcontainer.unhealthy- Health check failedcontainer.failed- Max restarts exceededcontainer.restarted- Container restarted by health monitor
Image Events:
image.pull.started- Image pull begunimage.pull.completed- Image pull successfulimage.pull.failed- Image pull failed
Uninstallation Events:
orchestration.uninstall.started- Uninstall begunorchestration.uninstall.completed- Container removedorchestration.uninstall.failed- Uninstall failed
Container Configuration
Deployment Specification
From installer.go:350-420:
Image Reference:
{registry_url}/{app_name}:{stage}
Example:
registry.example.com/todos-app:production
Container Config:
container.Config{
Image: imageRef,
Env: envVarsList, // From EnvBuilder
Labels: map[string]string{
"app.id": appID.String(),
"deployment.id": deploymentID.String(),
"app.name": appName,
"version": app.Version,
"managed-by": "appserver",
},
}
Host Config:
container.HostConfig{
RestartPolicy: container.RestartPolicy{
Name: "unless-stopped",
},
Resources: container.Resources{
Memory: config.MemoryLimit,
CPUShares: config.CPUShares,
},
Binds: []string{
volumeName + ":/app/data",
},
}
Rollback and Error Handling
Failure Points
1. Image Pull Failure:
- State:
ImagePullFailed - Action: Publish failure event
- Recovery: Retryable manually
2. Container Creation Failure:
- Trigger: Session rollback
- Action: Remove all session containers
- Preserve: Existing dependencies
3. Container Start Failure:
- Trigger: Session rollback
- Action: Stop and remove containers
- State: Mark deployment Failed
4. Health Check Failure:
- State:
Unhealthy - Action: Health monitor restart
- Limit: Max 5 attempts
5. Dependency Resolution Failure:
- Early abort
- No rollback needed
- Nothing created yet
Rollback Process
From session.go:85-150:
func (s *InstallationSession) Rollback(
ctx context.Context,
dockerClient docker.Client,
deploymentRepo repository.DeploymentRepository,
logger telemetry.Logger,
) error {
// Iterate deployments in reverse order
for deploymentID, containerID := range s.deployments {
// 1. Stop container (10s timeout)
if err := dockerClient.StopContainer(ctx, containerID, stopTimeout); err != nil {
logger.Warn("failed to stop container during rollback", ...)
}
// 2. Remove container (forced)
if err := dockerClient.RemoveContainer(ctx, containerID, true); err != nil {
logger.Warn("failed to remove container during rollback", ...)
}
// 3. Mark deployment as Failed
deployment, _ := deploymentRepo.FindByID(ctx, deploymentID)
deployment.State = deployment.StateFailed
deploymentRepo.Update(ctx, deployment)
// 4. Publish failure events
failEvt := event.NewContainerFailed(...)
eventBus.Publish(ctx, failEvt)
}
}
Standalone Binary
Orchestrator Service
From pkg/v2/orchestrator/orchestrator.go:
Orchestrator struct (lines 20-35):
type Orchestrator struct {
installer *orchestration.InstallationOrchestrator
uninstaller *orchestration.UninstallationOrchestrator
healthMonitor *orchestration.HealthMonitor
eventBus event.Bus
logger telemetry.Logger
stopChan chan struct{}
}
Lifecycle:
// Start() - lines 60-120
1. Subscribe to app.installed event
2. Subscribe to app.uninstalled event
3. Start health monitor in goroutine
4. Block waiting for stop signal
// Stop() - lines 122-145
1. Close stop channel
2. Unsubscribe from events
3. Stop health monitor
4. Close connections
Entry Point
From cmd/orchestrator/main.go:
func main() {
// 1. Load configuration from environment
cfg := config.LoadFromEnv()
// 2. Validate Docker is enabled
if !cfg.Docker.Enabled {
log.Fatal("Docker orchestration disabled")
}
// 3. Initialize telemetry
logger := telemetry.NewLogger(cfg.Telemetry)
// 4. Create orchestrator instance
orch, err := orchestrator.New(cfg, logger)
if err != nil {
log.Fatal("Failed to create orchestrator", err)
}
// 5. Setup graceful shutdown
signalChan := make(chan os.Signal, 1)
signal.Notify(signalChan, os.Interrupt, syscall.SIGTERM)
// 6. Start orchestrator (blocks)
if err := orch.Start(ctx); err != nil {
log.Fatal("Failed to start orchestrator", err)
}
// 7. Wait for shutdown signal
<-signalChan
// 8. Stop orchestrator gracefully
orch.Stop()
}
Configuration
From pkg/v2/config/docker.go:
type DockerConfig struct {
Enabled bool // APPSERVER_DOCKER_ENABLED
SocketPath string // DOCKER_SOCKET_PATH
NetworkName string // DOCKER_NETWORK_NAME
NetworkSubnet string // DOCKER_NETWORK_SUBNET
RegistryURL string // DOCKER_REGISTRY_URL
RegistryUsername string // DOCKER_REGISTRY_USERNAME
RegistryPassword string // DOCKER_REGISTRY_PASSWORD
RegistryStage string // DOCKER_REGISTRY_STAGE (e.g., "production")
MemoryLimit int64 // Default container memory limit
CPUShares int64 // Default CPU shares
MaxRestarts int // Health monitor restart limit
StopTimeout int // Graceful stop timeout (seconds)
HealthCheckInterval int // Health check interval (seconds)
}
Environment Variables:
# Core
APPSERVER_DOCKER_ENABLED=true
# Docker Connection
DOCKER_SOCKET_PATH=/var/run/docker.sock
# Network
DOCKER_NETWORK_NAME=appserver_network
DOCKER_NETWORK_SUBNET=172.20.0.0/16
# Registry
DOCKER_REGISTRY_URL=registry.example.com
DOCKER_REGISTRY_USERNAME=user
DOCKER_REGISTRY_PASSWORD=secret
DOCKER_REGISTRY_STAGE=production
# Resources
DOCKER_MEMORY_LIMIT=536870912 # 512MB
DOCKER_CPU_SHARES=1024
# Health Monitoring
DOCKER_MAX_RESTARTS=5
DOCKER_STOP_TIMEOUT=10
DOCKER_HEALTH_CHECK_INTERVAL=30
Concurrency Patterns
Thread Safety
InstallationSession:
type InstallationSession struct {
mu sync.RWMutex // Protects deployments maps
deployments map[uuid.UUID]string
appDeployments map[uuid.UUID]uuid.UUID
}
- RWMutex for concurrent access
- Write lock for registration
- Read lock for rollback
HealthMonitor:
- Single-threaded monitoring loop
- Goroutine per orchestrator instance
- Stop channel for graceful shutdown
Stateless Design:
- No shared mutable state across requests
- Each installation gets unique session
- Thread-safe through database locking
Horizontal Scalability
Multiple Orchestrator Instances:
- Stateless service design
- Event queue provides work distribution
- Database provides coordination
- No leader election needed
Database Connection Pooling:
- Max connections: 25 per instance
- Idle connections: 5
- Connection reuse for performance
Security Considerations
Environment Variable Filtering
Prevents:
- Direct database access
- Direct cache access
- Direct message queue access
- Privilege escalation through credentials
Enforces:
- All external service access through AppServer
- Monitoring and logging at AppServer layer
- Fine-grained permission checks
Container Isolation
Resource Limits:
- Memory limits prevent OOM attacks
- CPU shares prevent CPU starvation
- Prevents single container from exhausting host
Network Isolation:
- Shared network for inter-container communication
- No host network access
- No privileged containers
Volume Isolation:
- App-specific volumes only
- No host path mounts
- No privileged file access
Event-Based Authorization
Security Properties:
- Only responds to authenticated events
- Deployment records tied to app IDs
- State changes audited through events
- No direct API access (event-driven only)
Performance Characteristics
Scalability
Horizontal Scaling:
- Stateless orchestrator instances
- RabbitMQ provides work queue
- Database connection pooling
- No coordination overhead
Deployment Parallelism:
- Within dependency constraints
- Parallel deployment of independent apps
- Sequential deployment of dependent apps
Latency
Typical Deployment:
- Image pull: 5-30 seconds (varies by size)
- Container creation: <1 second
- Container start: <1 second
- Health check: 5-30 seconds (configurable)
- Total: 10-60 seconds typical
Health Check Frequency:
- Configurable interval (default: 30s)
- Runs continuously in background
- Low overhead (inspect API calls)
Code Reference Table
| Component | File | Lines | Tests/Verification | Description |
|---|---|---|---|---|
| InstallationOrchestrator | orchestration/installer.go | 19-34 | integration/orchestration_e2e_test.go:50-150 | Main installation coordinator |
| HandleAppInstalled | orchestration/installer.go | 71-121 | Event handler integration test | Event handler entry point |
| Install | orchestration/installer.go | 124-280 | Full installation flow test | 10-step installation workflow |
| deployApp | orchestration/installer.go | 282-450 | Unit tested | Single app deployment |
| UninstallationOrchestrator | orchestration/uninstaller.go | 20-35 | integration/orchestration_e2e_test.go:200-280 | Handles graceful uninstall |
| HandleAppUninstalled | orchestration/uninstaller.go | 40-90 | Event handler test | Event handler entry point |
| Uninstall | orchestration/uninstaller.go | 92-200 | Full uninstall flow test | 10-step uninstall workflow |
| checkForHealthyDependents | orchestration/uninstaller.go | 203-250 | Dependent guard test | Safety check |
| HealthMonitor | orchestration/health_monitor.go | 20-29 | integration/health_monitor_test.go:30-120 | Continuous monitoring |
| Monitor | orchestration/health_monitor.go | 50-150 | Health check loop test | Main monitoring loop |
| handleUnhealthyDeployment | orchestration/health_monitor.go | 152-230 | Restart policy test | Restart logic |
| EnvBuilder | orchestration/env_builder.go | 16-25 | orchestration/env_builder_test.go:20-200 | Environment assembly |
| Build | orchestration/env_builder.go | 45-105 | 3-layer precedence test | Main build function |
| shouldIncludeEnvVar | orchestration/env_builder.go | 156-180 | Security filtering test | Whitelist filter |
| transformValue | orchestration/env_builder.go | 182-220 | Address translation test | Docker name resolution |
| InstallationSession | orchestration/session.go | 12-35 | Session rollback test | Tracks deployments |
| Rollback | orchestration/session.go | 85-150 | Rollback flow test | Cleanup on failure |
| NetworkManager | infrastructure/docker/network.go | 15-80 | Unit tested | Network operations |
| VolumeManager | infrastructure/docker/volume.go | 12-90 | Unit tested | Volume operations |
| Docker Client | infrastructure/docker/client.go | 20-500 | Integration tested | Docker API wrapper |
| Orchestrator Service | orchestrator/orchestrator.go | 20-145 | integration/orchestrator_test.go | Service coordinator |
| Main Entry | cmd/orchestrator/main.go | 1-150 | Manual testing | Standalone binary |
Related Topics
- Application Marketplace - App installation triggers orchestration
- Application Lifecycle - Lifecycle states and transitions
- Event-Driven Architecture - Event patterns and pub/sub
- Dependency Management - Dependency resolution algorithms
- Platform Architecture - Overall system architecture
- Settings Management - App-specific environment variable overrides
- Configuration - Docker configuration options