Docker Images¶
All pipeline stages run inside Docker containers built from a multi-stage Dockerfile. This ensures consistent, reproducible environments across development, testing, and production.
Why Docker?¶
Docker packages everything the pipeline needs (Python, epymodelingsuite, and all dependencies) into a single image that can run anywhere: your laptop, a cloud VM, or CI/CD.
There are two image variants (local and cloud) that share the same base with all pipeline dependencies. The local variant is minimal for development with Docker Compose, while the cloud variant adds gcloud CLI for production runs on Google Cloud Batch.
Image architecture¶
graph TB
subgraph "Multi-Stage Build"
BASE[base stage<br/>Python + Dependencies]
LOCAL[local stage<br/>Base + Local Tools]
CLOUD[cloud stage<br/>Base + gcloud CLI]
end
SUITE[epymodelingsuite<br/>GitHub Package] -->|uv sync --frozen| BASE
DEPS[docker/pyproject.toml + uv.lock] -->|uv export ❘ uv pip install| BASE
BASE --> LOCAL
BASE --> CLOUD
GCLOUD[gcloud CLI] --> CLOUD
LOCAL --> LOCAL_IMG[epymodelingsuite:local]
CLOUD --> CLOUD_IMG[epymodelingsuite:cloud]
style BASE fill:#4285f4,color:#fff
style LOCAL fill:#34a853,color:#fff
style CLOUD fill:#fbbc04,color:#000 The Dockerfile uses multi-stage builds to create two image variants:
- local: Minimal image for local development (no cloud tools)
- cloud: Full image with gcloud CLI for Google Cloud execution
Both variants share the same base dependencies and epymodelingsuite package.
Build stages¶
base¶
- Installs Python 3.11, system packages (git, curl), and the
uvpackage manager - Clones and installs
epymodelingsuitefrom GitHub usinguv sync --frozen - Installs cloud-specific dependencies from
docker/pyproject.toml+docker/uv.lockinto the same virtual environment
local¶
- Inherits from base
- Minimal image for local development and testing with Docker Compose
cloud¶
- Inherits from base
- Adds gcloud CLI for Secret Manager access and Cloud Storage authentication
- Used for production runs on Google Cloud Batch
Dependency installation¶
Dependencies are installed from two sources, both using locked versions:
1. epymodelingsuite¶
- Cloned from the
mobs-lab/epymodelingsuiteGitHub repo to/opt/epymodelingsuite/ - Installed via
uv sync --frozen, which creates a virtual environment at/opt/epymodelingsuite/.venvusing the repo'suv.lock - Falls back to
uv sync(unlocked) ifuv.lockis not present
Build arguments
The repository and branch/tag are configurable via Docker build arguments, which are populated from your epycloud configuration:
GITHUB_MODELING_SUITE_REPO: GitHub repo (fromgithub.modeling_suite_repo)GITHUB_MODELING_SUITE_REF: Branch or commit (fromgithub.modeling_suite_ref, default:main)GITHUB_PAT: Personal access token (fromsecrets.yaml, only needed for private repos)
2. Cloud-specific dependencies¶
- Defined in this repo's
docker/pyproject.toml(google-cloud-storage, dill, python-json-logger) - Locked in
docker/uv.lock - Installed into the same venv via
uv export --frozen --no-dev | uv pip install --no-cache -r -
This two-step approach is needed because uv sync is project-centric (creates its own .venv), while uv pip install respects the VIRTUAL_ENV environment variable, allowing cloud deps to be added to the existing epymodelingsuite venv.
Pipeline scripts¶
All pipeline scripts are copied into the image at /scripts/ during build:
/scripts/
├── run_dispatcher.sh # Unified entrypoint (routes to stage scripts)
├── run_builder.sh # Stage A wrapper (clones experiment repo)
├── run_runner.sh # Stage B wrapper (downloads repo tarball or clones)
├── run_output.sh # Stage C wrapper (optional repo clone)
├── main_builder.py # Stage A main script
├── main_runner.py # Stage B main script
├── main_output.py # Stage C main script
└── util/
├── __init__.py
├── config.py # Configuration utilities
├── error_handling.py # Error handling utilities
├── logger.py # Structured JSON logging
└── storage.py # Storage abstraction layer
Entrypoint: Both local and cloud images use /scripts/run_dispatcher.sh as the entrypoint. The dispatcher routes to the correct stage script based on the STAGE environment variable (builder/A, runner/B, or output/C).
Container paths¶
These are the key directories inside the container. The first three are baked into the image at build time, while /data/ paths are populated at runtime (mounted locally or accessed via GCS).
| Path | Purpose | Source |
|---|---|---|
/app/ | Working directory | Dockerfile WORKDIR |
/scripts/ | Pipeline scripts | Copied from docker/scripts/ at build time |
/opt/epymodelingsuite/ | epymodelingsuite package + venv | Cloned from GitHub at build time |
/data/forecast/ | Experiment data (configs, common-data, functions) | Cloned from GitHub at runtime (cloud) or mounted from ./local/forecast/ (local) |
/data/bucket/ | Pipeline artifacts (inputs, results, outputs) | GCS via storage module (cloud) or mounted from ./local/bucket/ (local) |
Environment variables¶
Containers receive environment variables at runtime that control their behavior. In cloud mode, these are set by the workflow when submitting Batch jobs. In local mode, they are set via Docker Compose or the epycloud CLI.
All stages¶
| Variable | Description |
|---|---|
EXECUTION_MODE | cloud or local, determines storage backend and authentication |
EXP_ID | Experiment identifier |
RUN_ID | Run identifier (auto-generated in cloud, manual in local) |
LOG_LEVEL | Logging level (default: INFO) |
Cloud mode only¶
| Variable | Description |
|---|---|
GCS_BUCKET | GCS bucket name for artifact storage |
DIR_PREFIX | Base directory prefix (e.g., pipeline/flu/) |
GITHUB_FORECAST_REPO | Experiment data repo to clone (format: owner/repo) |
FORECAST_REPO_REF | Branch/tag/commit for experiment repo |
GCLOUD_PROJECT_ID | Google Cloud project ID |
GITHUB_PAT_SECRET | Secret Manager secret name for GitHub PAT |
FORECAST_REPO_DIR | Path to clone experiment repo into (default: /data/forecast/) |
Stage-specific¶
| Variable | Stage | Description |
|---|---|---|
TASK_INDEX | B | Task index for local mode (overrides BATCH_TASK_INDEX) |
BATCH_TASK_INDEX | B | Task index set automatically by Cloud Batch |
NUM_TASKS | C | Number of Stage B result files to load |
OUTPUT_CONFIG_FILE | C | Output config filename (e.g., output_projection.yaml) |
ALLOW_PARTIAL_RESULTS | C | Set to true to generate outputs when some Stage B tasks failed |
STORAGE_VERBOSE | All | Enable verbose storage logging (default: true in local mode) |
Container structure tests¶
Images are validated after build using container-structure-test, configured in docker/container-structure-test.yaml. These tests run automatically during Cloud Build and can also be run locally.
The tests verify:
epymodelingsuiteis importable- Python executable comes from the uv-managed venv (
/opt/epymodelingsuite/.venv/) - Cloud dependencies are importable (google-cloud-storage, dill, python-json-logger)
- Entrypoint script (
run_dispatcher.sh) exists and is executable
$ docker run --rm \
-v /var/run/docker.sock:/var/run/docker.sock \
-v $(pwd)/docker/container-structure-test.yaml:/config.yaml \
gcr.io/gcp-runtimes/container-structure-test:latest test \
--image <image-name> --config /config.yaml
Next steps¶
- Execution Modes: How
localandcloudimage variants are used in each mode - Building Images: How to build and push Docker images
- Pipeline Stages: How images are used in each stage
- Cloud Infrastructure: Where images are deployed
- Storage Abstraction: How scripts inside images access data