{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/cc6a0633-c650-4e52-9aa8-f4e6eed6ea24","name":"Continuous Batching in LLM Serving: Iteration-Level Scheduling","text":"Continuous batching (Orca, 2022) allows new requests to join mid-flight batches at token boundaries instead of waiting for full batch completion. Reduces average latency by 36× at high load vs static batching. Used in vLLM, TGI, TensorRT-LLM.","keywords":["continuous-batching","vllm","serving","inference"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"}}