Skip to content

Cloud Batch

Google Cloud Batch is a serverless compute service that provisions VMs and runs containers. The pipeline uses Cloud Batch to execute all stages.

Batch documentation | Compute Engine | Google Clouddocs.cloud.google.com

Jobs and tasks

A Batch job is a unit of work you submit to Cloud Batch. Each job specifies a container image, resource requirements, and one or more tasks to run. Tasks within a job run independently, each in its own container.

Every task is assigned a zero-based task index (BATCH_TASK_INDEX) that the pipeline uses to determine which input file to process (e.g., task 0 reads input_0000.pkl, task 5 reads input_0005.pkl).

Stage A (Builder) Stage B (Runner) Stage C (Output)
Tasks 1 N (one per input file) 1
Purpose Generate N input files Process inputs in parallel Aggregate all results

A job progresses through these states: QUEUEDSCHEDULEDRUNNINGSUCCEEDED or FAILED.

For more details on Batch components:

Get started with Batch | Google Clouddocs.cloud.google.com

Parallelism

Parallelism controls how many tasks run at the same time within a job. If a job has 1,000 tasks and parallelism is set to 100, Cloud Batch runs up to 100 tasks concurrently and queues the rest.

Setting Value
Default 100
Cloud Batch maximum 5,000 per job
Configuration key google_cloud.batch.max_parallelism

For more details on parallelism:

Job creation and execution overview | Google Clouddocs.cloud.google.com

Resource units

Cloud Batch uses two units to express compute requirements per task: cpuMilli and memoryMib. These values are set per stage in configuration (e.g., google_cloud.batch.stage_b.cpu_milli). See Configuration Variables for all options, and Machine Types for how these interact with machine type selection.

cpuMilli: CPU allocation in thousandths of a vCPU.

cpuMilli vCPUs
1000 1 vCPU
2000 2 vCPUs
4000 4 vCPUs

memoryMib: Memory allocation in mebibytes (MiB).

memoryMib Memory
2048 2 GB
4096 4 GB
8192 8 GB
15360 15 GB

For more information on compute resource options:

ComputeResource | REST Resource: projects.locations.jobs | Google Clouddocs.cloud.google.com

Further reading