Configuration Variables¶
Complete reference for all configuration keys, their types, and default values. These keys can be set in any configuration file (base config.yaml, environments, profiles, or project config).
For an introduction to the configuration system, file locations, and resolution order, see Configuring epycloud.
When each config is used¶
Configuration keys are consumed at three different points: infrastructure deployment, image building, and workflow execution. Understanding this helps you know which changes require redeploying infrastructure versus just re-running a workflow.
Infrastructure deployment¶
These keys are read when you run epycloud terraform apply. Their resolved values are baked into the deployed Cloud Workflows definition and Cloud infrastructure. Changing them in your config (whether in base, an environment, or a profile) has no effect until you run terraform apply again.
| Keys | Purpose |
|---|---|
google_cloud.project_id, region, bucket_name | Project infrastructure |
docker.repo_name, image_name, image_tag | Default image URI in workflow |
google_cloud.batch.task_count_per_node | Default tasks per VM |
google_cloud.batch.stage_a.* | Default Stage A resources |
google_cloud.batch.stage_b.* | Default Stage B resources |
google_cloud.batch.stage_c.* (including run_output_stage) | Default Stage C resources and behavior |
Note
Any key not defined in your config (across all layers) falls back to the default in Terraform's variables.tf, which matches the config template defaults.
Image building¶
These keys are read when you run epycloud build. They determine what goes into the Docker image. Changing them requires rebuilding.
| Keys | Purpose |
|---|---|
google_cloud.project_id, region | Registry path |
docker.* | Image name, tag, registry |
github.modeling_suite_repo, modeling_suite_ref | Which modeling suite version to install |
github.personal_access_token | Auth for private repos (local/dev builds only) |
Workflow execution¶
These keys are read each time you run epycloud run workflow. Changes take effect immediately on the next run without redeploying infrastructure. Some of these can also override the terraform-baked defaults.
| Keys | Purpose | Overrides terraform default? |
|---|---|---|
google_cloud.project_id, region, bucket_name | Where to submit and store data | No (must match deployed infra) |
storage.dir_prefix | GCS path prefix | N/A (runtime only) |
docker.image_tag | Which image tag to use for this run | Yes |
github.forecast_repo | Experiment repo to clone | N/A (runtime only) |
github.forecast_repo_ref | Branch/tag to checkout | N/A (runtime only) |
google_cloud.billing_project | Cost grouping label for billing reports | N/A (runtime only) |
google_cloud.batch.max_parallelism | Max parallel tasks | Yes |
google_cloud.batch.task_count_per_node | Tasks per VM | Yes |
google_cloud.batch.stage_*/machine_type | Machine type per stage (empty = auto-select based on CPU/memory) | Yes (via CLI flags) |
google_cloud.batch.stage_*/cpu_milli, memory_mib | CPU/memory per stage | Yes (via CLI flags, together with machine type) |
Tip
You can override machine types, parallelism, and image tag per run without redeploying infrastructure. This is useful for testing different resource allocations or running with a specific image version.
Environment Variable Overrides¶
Any configuration key can be overridden using environment variables with the EPYCLOUD_ prefix. Use double underscores (__) to separate nested paths. This works for every key listed on this page.
Examples:
| YAML path | Environment variable |
|---|---|
google_cloud.project_id | EPYCLOUD_GOOGLE_CLOUD__PROJECT_ID |
google_cloud.batch.stage_b.cpu_milli | EPYCLOUD_GOOGLE_CLOUD__BATCH__STAGE_B__CPU_MILLI |
docker.image_tag | EPYCLOUD_DOCKER__IMAGE_TAG |
storage.dir_prefix | EPYCLOUD_STORAGE__DIR_PREFIX |
storage¶
Directory prefix for organizing pipeline data in GCS (or local filesystem).
| Key | Type | Default | Description |
|---|---|---|---|
storage.dir_prefix | string | "pipeline/{environment}/{profile}" | Base directory prefix for all pipeline data. Supports template variables {environment} and {profile}, which are interpolated at runtime. |
Example paths after interpolation:
pipeline/prod/flu/(environment=prod, profile=flu)pipeline/dev/covid/(environment=dev, profile=covid)
google_cloud¶
Google Cloud Platform project and region settings.
| Key | Type | Default | Description |
|---|---|---|---|
google_cloud.project_id | string | (required) | Google Cloud project ID (e.g., my-gcp-project). |
google_cloud.region | string | us-central1 | Google Cloud region for all resources (Batch jobs, GCS, Artifact Registry). |
google_cloud.bucket_name | string | (required) | GCS bucket for pipeline input/output data. Must already exist. |
google_cloud.billing_project | string | "" | User-defined label for cost grouping in GCP billing reports. Applied to all Cloud Batch jobs. Can be overridden per run with --billing-project. |
google_cloud.batch¶
Cloud Batch job configuration controlling parallelism and per-stage compute resources.
| Key | Type | Default | Description |
|---|---|---|---|
google_cloud.batch.max_parallelism | integer | 100 | Maximum number of Stage B tasks running simultaneously. Cloud Batch limit is 5000. |
google_cloud.batch.task_count_per_node | integer | 1 | Number of tasks per VM. Set to 1 for dedicated VMs per task (recommended for predictable performance). |
google_cloud.batch.stage_a¶
Compute resources for Stage A (Builder). Single-task job that generates input files for Stage B.
| Key | Type | Default | Description |
|---|---|---|---|
google_cloud.batch.stage_a.cpu_milli | integer | 2000 | CPU allocation in millicores (2000 = 2 vCPUs). |
google_cloud.batch.stage_a.memory_mib | integer | 8192 | Memory allocation in MiB (8192 = 8 GB). |
google_cloud.batch.stage_a.machine_type | string | "c4d-standard-2" | Google Cloud machine type. Empty string ("") lets Cloud Batch auto-select based on CPU/memory requirements. |
google_cloud.batch.stage_a.max_run_duration | integer | 3600 | Maximum execution time in seconds (3600 = 1 hour). Tasks exceeding this limit are terminated. |
google_cloud.batch.stage_b¶
Compute resources for Stage B (Runner). Parallel tasks, each processing one input file.
| Key | Type | Default | Description |
|---|---|---|---|
google_cloud.batch.stage_b.cpu_milli | integer | 2000 | CPU allocation in millicores (2000 = 2 vCPUs). |
google_cloud.batch.stage_b.memory_mib | integer | 8192 | Memory allocation in MiB (8192 = 8 GB). |
google_cloud.batch.stage_b.machine_type | string | "" | Google Cloud machine type. Empty string lets Cloud Batch auto-select. Set explicitly (e.g., "e2-standard-2") for predictable scaling. |
google_cloud.batch.stage_b.max_run_duration | integer | 36000 | Maximum execution time in seconds (36000 = 10 hours). See sizing guidelines below. |
google_cloud.batch.stage_c¶
Compute resources for Stage C (Output). Runs as a single task that loads all Stage B results into memory, so it typically needs more memory than the other stages.
| Key | Type | Default | Description |
|---|---|---|---|
google_cloud.batch.stage_c.cpu_milli | integer | 4000 | CPU allocation in millicores (4000 = 4 vCPUs). |
google_cloud.batch.stage_c.memory_mib | integer | 15360 | Memory allocation in MiB (15360 = 15 GB). |
google_cloud.batch.stage_c.machine_type | string | "c4d-standard-4" | Google Cloud machine type. Empty string lets Cloud Batch auto-select. |
google_cloud.batch.stage_c.max_run_duration | integer | 7200 | Maximum execution time in seconds (7200 = 2 hours). See sizing guidelines below. |
google_cloud.batch.stage_c.run_output_stage | boolean | true | Whether to run Stage C after Stage B completes. Set to false to skip output generation (e.g., when only raw runner artifacts are needed). |
Sizing guidelines¶
Stage B (Runner):
| Workload | Recommended max_run_duration |
|---|---|
| Short simulations (< 1 hour) | 3600 |
| Medium simulations (1-5 hours) | 18000 |
| Long simulations (5-10 hours) | 36000 (default) |
| Very long simulations | Up to 604800 (7 days, Cloud Batch limit) |
Stage C (Output):
| Workload | Recommended max_run_duration | Memory guidance |
|---|---|---|
| Small runs (< 100 tasks) | 1800 (30 min) | 8 GB sufficient |
| Medium runs (100-1,000 tasks) | 7200 (default) | 8-15 GB |
| Large runs (1,000-10,000 tasks) | 14400 (4 hours) | 16-32 GB |
Machine type selection¶
When machine_type is set to a specific value (e.g., "c4d-standard-2"):
- Cloud Batch provisions that exact machine type
cpu_milliandmemory_mibact as task-level constraints (must fit within the machine)
When machine_type is empty (""):
- Cloud Batch auto-selects a VM based on
cpu_milliandmemory_mib - Recommended when you don't need a specific machine family
For available machine types, pricing, and sizing recommendations, see Machine Types.
docker¶
Docker image configuration for the pipeline container.
| Key | Type | Default | Description |
|---|---|---|---|
docker.registry | string | "us-central1-docker.pkg.dev" | Container registry URL. For Artifact Registry, format is {region}-docker.pkg.dev. |
docker.repo_name | string | "epymodelingsuite-repo" | Artifact Registry repository name. |
docker.image_name | string | "epymodelingsuite" | Docker image name. |
docker.image_tag | string | "latest" | Docker image tag. Use specific tags (e.g., v1.0.0) in production. |
The full image URI is constructed as:
github¶
GitHub repository references for the modeling suite package and experiment data.
| Key | Type | Default | Description |
|---|---|---|---|
github.modeling_suite_repo | string | "mobs-lab/epymodelingsuite" | GitHub repository for the modeling suite package (format: owner/repo). Cloned during Docker build. |
github.modeling_suite_ref | string | "main" | Branch, tag, or commit to use when building the Docker image. |
github.forecast_repo | string | (profile-specific) | GitHub repository for experiment data (format: owner/repo). Typically set in profile configs. Cloned at runtime by Stage A and Stage C. |
github.forecast_repo_ref | string | "" | Branch, tag, or commit to checkout after cloning the forecast repo. Empty string uses the repository's default branch. |
github.personal_access_token | string | (secrets.yaml) | GitHub PAT for accessing private repositories. Store in secrets.yaml, not in config.yaml. See Secrets. |
logging¶
Logging configuration for pipeline scripts and the CLI.
| Key | Type | Default | Description |
|---|---|---|---|
logging.level | string | "INFO" | Log level. One of: DEBUG, INFO, WARNING, ERROR. |
logging.storage_verbose | boolean | true | Enable verbose logging for storage operations (uploads, downloads, listings). |
workflow¶
Cloud Workflows orchestration settings.
| Key | Type | Default | Description |
|---|---|---|---|
workflow.retry_policy.max_attempts | integer | 3 | Maximum retry attempts for failed workflow steps. |
workflow.retry_policy.backoff_seconds | integer | 60 | Backoff duration in seconds between retries. |
workflow.notification.enabled | boolean | false | Enable workflow completion/failure notifications. |
workflow.notification.email | string | null | Email address for workflow notifications. Requires notification.enabled: true. |
Runtime Environment Variables
For pipeline runtime variables (NUM_TASKS, ALLOW_PARTIAL_RESULTS, BATCH_TASK_INDEX, etc.) that are not part of the configuration file system, see Environment Variables.
Complete Template¶
For reference, here is the full default config.yaml template:
# Storage configuration
storage:
dir_prefix: "pipeline/{environment}/{profile}"
# Google Cloud Platform configuration
google_cloud:
project_id: your-gcp-project-id
region: us-central1
bucket_name: your-bucket-name
billing_project: ""
batch:
max_parallelism: 100
task_count_per_node: 1
stage_a:
cpu_milli: 2000
memory_mib: 8192
machine_type: "c4d-standard-2"
max_run_duration: 3600
stage_b:
cpu_milli: 2000
memory_mib: 8192
machine_type: ""
max_run_duration: 36000
stage_c:
cpu_milli: 4000
memory_mib: 15360
machine_type: "c4d-standard-4"
max_run_duration: 7200
run_output_stage: true
# Docker image configuration
docker:
registry: "us-central1-docker.pkg.dev"
repo_name: epymodelingsuite-repo
image_name: epymodelingsuite
image_tag: latest
# GitHub repositories
github:
modeling_suite_repo: mobs-lab/epymodelingsuite
modeling_suite_ref: main
forecast_repo_ref: ""
# Logging configuration
logging:
level: INFO
storage_verbose: true
# Workflow configuration
workflow:
retry_policy:
max_attempts: 3
backoff_seconds: 60
notification:
enabled: false
email: null