Skip to content

Configuration Variables

Complete reference for all configuration keys, their types, and default values. These keys can be set in any configuration file (base config.yaml, environments, profiles, or project config).

For an introduction to the configuration system, file locations, and resolution order, see Configuring epycloud.

When each config is used

Configuration keys are consumed at three different points: infrastructure deployment, image building, and workflow execution. Understanding this helps you know which changes require redeploying infrastructure versus just re-running a workflow.

Infrastructure deployment

These keys are read when you run epycloud terraform apply. Their resolved values are baked into the deployed Cloud Workflows definition and Cloud infrastructure. Changing them in your config (whether in base, an environment, or a profile) has no effect until you run terraform apply again.

Keys Purpose
google_cloud.project_id, region, bucket_name Project infrastructure
docker.repo_name, image_name, image_tag Default image URI in workflow
google_cloud.batch.task_count_per_node Default tasks per VM
google_cloud.batch.stage_a.* Default Stage A resources
google_cloud.batch.stage_b.* Default Stage B resources
google_cloud.batch.stage_c.* (including run_output_stage) Default Stage C resources and behavior

Note

Any key not defined in your config (across all layers) falls back to the default in Terraform's variables.tf, which matches the config template defaults.

Image building

These keys are read when you run epycloud build. They determine what goes into the Docker image. Changing them requires rebuilding.

Keys Purpose
google_cloud.project_id, region Registry path
docker.* Image name, tag, registry
github.modeling_suite_repo, modeling_suite_ref Which modeling suite version to install
github.personal_access_token Auth for private repos (local/dev builds only)

Workflow execution

These keys are read each time you run epycloud run workflow. Changes take effect immediately on the next run without redeploying infrastructure. Some of these can also override the terraform-baked defaults.

Keys Purpose Overrides terraform default?
google_cloud.project_id, region, bucket_name Where to submit and store data No (must match deployed infra)
storage.dir_prefix GCS path prefix N/A (runtime only)
docker.image_tag Which image tag to use for this run Yes
github.forecast_repo Experiment repo to clone N/A (runtime only)
github.forecast_repo_ref Branch/tag to checkout N/A (runtime only)
google_cloud.billing_project Cost grouping label for billing reports N/A (runtime only)
google_cloud.batch.max_parallelism Max parallel tasks Yes
google_cloud.batch.task_count_per_node Tasks per VM Yes
google_cloud.batch.stage_*/machine_type Machine type per stage (empty = auto-select based on CPU/memory) Yes (via CLI flags)
google_cloud.batch.stage_*/cpu_milli, memory_mib CPU/memory per stage Yes (via CLI flags, together with machine type)

Tip

You can override machine types, parallelism, and image tag per run without redeploying infrastructure. This is useful for testing different resource allocations or running with a specific image version.

Environment Variable Overrides

Any configuration key can be overridden using environment variables with the EPYCLOUD_ prefix. Use double underscores (__) to separate nested paths. This works for every key listed on this page.

Examples:

YAML path Environment variable
google_cloud.project_id EPYCLOUD_GOOGLE_CLOUD__PROJECT_ID
google_cloud.batch.stage_b.cpu_milli EPYCLOUD_GOOGLE_CLOUD__BATCH__STAGE_B__CPU_MILLI
docker.image_tag EPYCLOUD_DOCKER__IMAGE_TAG
storage.dir_prefix EPYCLOUD_STORAGE__DIR_PREFIX

storage

Directory prefix for organizing pipeline data in GCS (or local filesystem).

Key Type Default Description
storage.dir_prefix string "pipeline/{environment}/{profile}" Base directory prefix for all pipeline data. Supports template variables {environment} and {profile}, which are interpolated at runtime.

Example paths after interpolation:

  • pipeline/prod/flu/ (environment=prod, profile=flu)
  • pipeline/dev/covid/ (environment=dev, profile=covid)

google_cloud

Google Cloud Platform project and region settings.

Key Type Default Description
google_cloud.project_id string (required) Google Cloud project ID (e.g., my-gcp-project).
google_cloud.region string us-central1 Google Cloud region for all resources (Batch jobs, GCS, Artifact Registry).
google_cloud.bucket_name string (required) GCS bucket for pipeline input/output data. Must already exist.
google_cloud.billing_project string "" User-defined label for cost grouping in GCP billing reports. Applied to all Cloud Batch jobs. Can be overridden per run with --billing-project.

google_cloud.batch

Cloud Batch job configuration controlling parallelism and per-stage compute resources.

Key Type Default Description
google_cloud.batch.max_parallelism integer 100 Maximum number of Stage B tasks running simultaneously. Cloud Batch limit is 5000.
google_cloud.batch.task_count_per_node integer 1 Number of tasks per VM. Set to 1 for dedicated VMs per task (recommended for predictable performance).

google_cloud.batch.stage_a

Compute resources for Stage A (Builder). Single-task job that generates input files for Stage B.

Key Type Default Description
google_cloud.batch.stage_a.cpu_milli integer 2000 CPU allocation in millicores (2000 = 2 vCPUs).
google_cloud.batch.stage_a.memory_mib integer 8192 Memory allocation in MiB (8192 = 8 GB).
google_cloud.batch.stage_a.machine_type string "c4d-standard-2" Google Cloud machine type. Empty string ("") lets Cloud Batch auto-select based on CPU/memory requirements.
google_cloud.batch.stage_a.max_run_duration integer 3600 Maximum execution time in seconds (3600 = 1 hour). Tasks exceeding this limit are terminated.

google_cloud.batch.stage_b

Compute resources for Stage B (Runner). Parallel tasks, each processing one input file.

Key Type Default Description
google_cloud.batch.stage_b.cpu_milli integer 2000 CPU allocation in millicores (2000 = 2 vCPUs).
google_cloud.batch.stage_b.memory_mib integer 8192 Memory allocation in MiB (8192 = 8 GB).
google_cloud.batch.stage_b.machine_type string "" Google Cloud machine type. Empty string lets Cloud Batch auto-select. Set explicitly (e.g., "e2-standard-2") for predictable scaling.
google_cloud.batch.stage_b.max_run_duration integer 36000 Maximum execution time in seconds (36000 = 10 hours). See sizing guidelines below.

google_cloud.batch.stage_c

Compute resources for Stage C (Output). Runs as a single task that loads all Stage B results into memory, so it typically needs more memory than the other stages.

Key Type Default Description
google_cloud.batch.stage_c.cpu_milli integer 4000 CPU allocation in millicores (4000 = 4 vCPUs).
google_cloud.batch.stage_c.memory_mib integer 15360 Memory allocation in MiB (15360 = 15 GB).
google_cloud.batch.stage_c.machine_type string "c4d-standard-4" Google Cloud machine type. Empty string lets Cloud Batch auto-select.
google_cloud.batch.stage_c.max_run_duration integer 7200 Maximum execution time in seconds (7200 = 2 hours). See sizing guidelines below.
google_cloud.batch.stage_c.run_output_stage boolean true Whether to run Stage C after Stage B completes. Set to false to skip output generation (e.g., when only raw runner artifacts are needed).

Sizing guidelines

Stage B (Runner):

Workload Recommended max_run_duration
Short simulations (< 1 hour) 3600
Medium simulations (1-5 hours) 18000
Long simulations (5-10 hours) 36000 (default)
Very long simulations Up to 604800 (7 days, Cloud Batch limit)

Stage C (Output):

Workload Recommended max_run_duration Memory guidance
Small runs (< 100 tasks) 1800 (30 min) 8 GB sufficient
Medium runs (100-1,000 tasks) 7200 (default) 8-15 GB
Large runs (1,000-10,000 tasks) 14400 (4 hours) 16-32 GB

Machine type selection

When machine_type is set to a specific value (e.g., "c4d-standard-2"):

  • Cloud Batch provisions that exact machine type
  • cpu_milli and memory_mib act as task-level constraints (must fit within the machine)

When machine_type is empty (""):

  • Cloud Batch auto-selects a VM based on cpu_milli and memory_mib
  • Recommended when you don't need a specific machine family

For available machine types, pricing, and sizing recommendations, see Machine Types.

docker

Docker image configuration for the pipeline container.

Key Type Default Description
docker.registry string "us-central1-docker.pkg.dev" Container registry URL. For Artifact Registry, format is {region}-docker.pkg.dev.
docker.repo_name string "epymodelingsuite-repo" Artifact Registry repository name.
docker.image_name string "epymodelingsuite" Docker image name.
docker.image_tag string "latest" Docker image tag. Use specific tags (e.g., v1.0.0) in production.

The full image URI is constructed as:

{registry}/{google_cloud.project_id}/{repo_name}/{image_name}:{image_tag}

github

GitHub repository references for the modeling suite package and experiment data.

Key Type Default Description
github.modeling_suite_repo string "mobs-lab/epymodelingsuite" GitHub repository for the modeling suite package (format: owner/repo). Cloned during Docker build.
github.modeling_suite_ref string "main" Branch, tag, or commit to use when building the Docker image.
github.forecast_repo string (profile-specific) GitHub repository for experiment data (format: owner/repo). Typically set in profile configs. Cloned at runtime by Stage A and Stage C.
github.forecast_repo_ref string "" Branch, tag, or commit to checkout after cloning the forecast repo. Empty string uses the repository's default branch.
github.personal_access_token string (secrets.yaml) GitHub PAT for accessing private repositories. Store in secrets.yaml, not in config.yaml. See Secrets.

logging

Logging configuration for pipeline scripts and the CLI.

Key Type Default Description
logging.level string "INFO" Log level. One of: DEBUG, INFO, WARNING, ERROR.
logging.storage_verbose boolean true Enable verbose logging for storage operations (uploads, downloads, listings).

workflow

Cloud Workflows orchestration settings.

Key Type Default Description
workflow.retry_policy.max_attempts integer 3 Maximum retry attempts for failed workflow steps.
workflow.retry_policy.backoff_seconds integer 60 Backoff duration in seconds between retries.
workflow.notification.enabled boolean false Enable workflow completion/failure notifications.
workflow.notification.email string null Email address for workflow notifications. Requires notification.enabled: true.

Runtime Environment Variables

For pipeline runtime variables (NUM_TASKS, ALLOW_PARTIAL_RESULTS, BATCH_TASK_INDEX, etc.) that are not part of the configuration file system, see Environment Variables.

Complete Template

For reference, here is the full default config.yaml template:

config.yaml
# Storage configuration
storage:
  dir_prefix: "pipeline/{environment}/{profile}"

# Google Cloud Platform configuration
google_cloud:
  project_id: your-gcp-project-id
  region: us-central1
  bucket_name: your-bucket-name
  billing_project: ""

  batch:
    max_parallelism: 100
    task_count_per_node: 1

    stage_a:
      cpu_milli: 2000
      memory_mib: 8192
      machine_type: "c4d-standard-2"
      max_run_duration: 3600

    stage_b:
      cpu_milli: 2000
      memory_mib: 8192
      machine_type: ""
      max_run_duration: 36000

    stage_c:
      cpu_milli: 4000
      memory_mib: 15360
      machine_type: "c4d-standard-4"
      max_run_duration: 7200
      run_output_stage: true

# Docker image configuration
docker:
  registry: "us-central1-docker.pkg.dev"
  repo_name: epymodelingsuite-repo
  image_name: epymodelingsuite
  image_tag: latest

# GitHub repositories
github:
  modeling_suite_repo: mobs-lab/epymodelingsuite
  modeling_suite_ref: main
  forecast_repo_ref: ""

# Logging configuration
logging:
  level: INFO
  storage_verbose: true

# Workflow configuration
workflow:
  retry_policy:
    max_attempts: 3
    backoff_seconds: 60
  notification:
    enabled: false
    email: null