Skip to content

Running Workflows

Build Docker images and execute your first workflow on Google Cloud.

Tips

Make sure you've completed Setup and have infrastructure deployed before continuing.

Step 1: Build Docker image

The pipeline runs inside Docker containers on Google Cloud. The Docker image packages the epymodelingsuite library and pipeline scripts into a portable environment that Cloud Batch uses to execute each stage. You need to build and push this image before running any cloud workflow.

Submit an asynchronous build to Google Cloud Build:

$ epycloud build cloud

This submits the build to Cloud Build and returns immediately with a build ID. The image is built remotely and pushed to Artifact Registry automatically.

Monitor build status:

$ epycloud build status

Builds typically take few minutes. Wait for the build to succeed before proceeding.

You can also check build status and view built images in the Cloud Console:

See epycloud build for all build options including local builds.

When to rebuild
  • After updating epymodelingsuite (new version or different branch)
  • After changing pipeline scripts in docker/scripts/
  • When using an environment with a different modeling_suite_ref or image_tag (see Using Environments)

Step 2: Prepare experiment configuration

Your experiment configuration must exist in the experiment repository (configured in your profile's forecast_repo). See Experiment Repository for how to set up the repository.

Push to the repository

Since Stage A clones the repository at runtime, your experiment must be on the default branch (usually main). We recommend using a branch and pull request workflow:

$ cd /path/to/experiment-repo
$ git checkout -b add-my-experiment-001
$ git add experiments/my-experiment-001/
$ git commit -m "Add my-experiment-001 config"
$ git push origin add-my-experiment-001

Then create a pull request on GitHub, review the configuration, and merge to main.

Validate configuration (optional)

Before submitting, you can validate your experiment configuration locally:

$ epycloud validate --exp-id my-experiment-001

This checks that the configuration files are valid and can be parsed by the pipeline.

Step 3: Submit workflow

We are finally ready to submit the workflow to cloud.

$ epycloud run workflow --exp-id my-experiment-001

This submits a Cloud Workflow that orchestrates the full pipeline:

  1. Stage A (Builder): Generates task inputs from experiment configuration
  2. Stage B (Runner): Runs simulations/calibrations in parallel
  3. Stage C (Output): Aggregates results into CSV outputs

The command returns an execution ID:

Execution ID: abcdef12-3456-7890

Step 4: Monitor progress

Check the status of active workflows and jobs:

$ epycloud status

Use watch mode for continuous monitoring:

$ epycloud status -w

See epycloud status for more options.

Additional monitoring commands

View detailed status of a specific execution:

$ epycloud workflow describe abcdef12-3456-7890

Stream logs in real-time:

$ epycloud logs --exp-id my-experiment-001 --follow

Filter logs by stage:

$ epycloud logs --exp-id my-experiment-001 --stage A
$ epycloud logs --exp-id my-experiment-001 --stage B
$ epycloud logs --exp-id my-experiment-001 --stage C

Step 5: View results

Results are stored in GCS at the path configured by your dir_prefix and experiment ID.

List runs:

$ gsutil ls gs://my-bucket-name/pipeline/flu/my-experiment-001/

Download results:

$ gsutil -m cp -r gs://my-bucket-name/pipeline/flu/my-experiment-001/RUN-ID/outputs/ ./results/

Output files include:

  • quantiles_*.csv.gz - Quantile summaries
  • trajectories_*.csv.gz - Individual trajectories
  • metadata_*.csv.gz - Run metadata

Managing workflows

Cancel a workflow

Cancel the workflow and its associated batch jobs:

$ epycloud workflow cancel abcdef12-3456-7890

Cancel only the workflow (keep batch jobs running):

$ epycloud workflow cancel abcdef12-3456-7890 --only-workflow

View history

$ epycloud workflow list --exp-id my-experiment-001 --limit 50

Export logs

$ epycloud --no-color logs --exp-id my-experiment-001 --tail 0 > logs.txt

Troubleshooting

Build fails

"Permission denied" during Cloud Build

  • Verify your user has the Cloud Build Editor role (see Prerequisites)

Build succeeds but image not found

  • Check that the image tag matches your config: epycloud config show | grep image_tag

Workflow submission fails

"Workflow not found"

  • Infrastructure not deployed. Run epycloud terraform apply.

"Docker image not found"

  • Image not built or tag mismatch. Run epycloud build cloud.

"Permission denied"

  • Verify APIs are enabled and service accounts have correct permissions

Stage B tasks fail

Tasks timeout or run out of memory

  • Increase resources in config (stage_b.cpu_milli, stage_b.memory_mib)
  • Run epycloud terraform apply to update

High costs

Unexpected billing charges

Next Steps