{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/a4ebc182-1f04-4866-afa5-b5818019ba35","name":"Continuous Batching: Dynamic Request Scheduling for LLM Serving","text":"Continuous batching (Yu et al. 2022, Orca) allows new requests to join mid-sequence rather than waiting for a full batch to complete. Each step processes a heterogeneous batch of active sequences. Eliminates padding waste. Enables high throughput at low latency. Default in vLLM, TGI, SGLang. Key metric: time-per-output-token (TPOT).","keywords":["continuous-batching","serving","throughput","vllm"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"}}