Back to Glossary
MLOps

Batch Processing

Definition

Batch processing in AI systems runs inference or training on large datasets in scheduled jobs, prioritizing throughput and cost efficiency over real-time response.

Why It Matters

Batch processing is often the right choice for AI workloads, even though real-time inference gets more attention. Many AI use cases donโ€™t need sub-second responses: generating daily reports, processing overnight uploads, scoring customer segments, or embedding document libraries.

The economics favor batch processing when latency tolerance exists. You can use cheaper hardware, process during off-peak hours, and optimize for throughput rather than response time. Batch jobs also enable better GPU utilization, since packing requests into large batches maximizes the parallel processing capabilities that make GPUs efficient.

For AI engineers, understanding when to use batch versus real-time processing is an architecture decision that affects cost, complexity, and user experience. Defaulting to real-time APIs when batch processing would suffice leads to over-engineered, expensive systems.

Implementation Basics

Batch processing patterns:

  • Scheduled jobs run at fixed intervals (hourly, daily) via cron or workflow orchestrators
  • Queue-based processing pulls items from a queue (SQS, RabbitMQ) in batches
  • Map-reduce patterns distribute large datasets across workers for parallel processing

Optimizing batch inference:

  • Larger batch sizes improve GPU utilization (128-512 samples typical for LLMs)
  • Dynamic batching groups requests arriving within a time window
  • Prefetching loads next batch while processing current batch
  • Mixed precision (FP16) doubles throughput with minimal accuracy impact

When to choose batch processing:

  • Results can be pre-computed (recommendations, embeddings, scores)
  • Latency requirements are minutes to hours, not milliseconds
  • Dataset size is large enough that throughput optimization matters
  • Cost is a primary concern and real-time isnโ€™t required

Common batch processing tools:

  • Apache Airflow for workflow orchestration
  • Ray for distributed Python workloads
  • Spark for data transformation before/after inference
  • SageMaker Batch Transform for managed AWS batch inference

Start with simple scheduled scripts before adding orchestration frameworks. Many batch AI workloads are straightforward Python scripts running via cron. Add complexity only when you need scheduling, retries, or monitoring at scale.

Source

Batch prediction is optimized for high throughput, processing large volumes of data together, while online prediction is optimized for low latency.

https://cloud.google.com/architecture/ml-inference-batch-and-online