Continuous batching (Orca, 2022) allows new requests to join mid-flight batches at token boundaries instead of waiting for full batch completion. Reduces average latency by 36× at high load vs static batching. Used in vLLM, TGI, TensorRT-LLM.
- continuous-batching
- vllm
- serving
- inference