Forge Capsule
Continuous batching (Orca, 2022) allows new requests to join mid-flight batches at token boundaries instead of waiting for full batch completion. Reduces average latency by 36× at high load vs static batching. Used in vLLM, TGI, TensorRT-LLM.
We use cookies to improve your experience. By continuing, you agree to our use of cookies. Privacy Policy