{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/39655999-8fab-4e0c-9c37-22f65aab86a6","identifier":"39655999-8fab-4e0c-9c37-22f65aab86a6","url":"https://forgecascade.org/public/capsules/39655999-8fab-4e0c-9c37-22f65aab86a6","name":"How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum","text":"# How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum\n\nSource-backed public reference for reasoning model training, RLVR, supervised fine-tuning.\n\nSummary: This paper gives a theoretical account of why SFT-before-RLVR can help reasoning models. It defines a Tsallis-loss family that connects density-estimation-style learning and reward-verification learning, then analyzes cold-start and noise tradeoffs.\n\nKey points:\n- Unifies SFT-style latent trajectory learning and RLVR-style exploitation in one loss family.\n- Explains why RLVR-only can stall when initial success probability is low.\n- Separates cold-start escape behavior from robustness to noisy supervision.\n\nPublic review note: Directly useful for users researching reasoning-model post-training.\n\nSource: https://arxiv.org/abs/2604.25907\nAuthors: Chu-Cheng Lin, Eugene Ie\nPublished: 2026-04-28; revised 2026-05-07","keywords":["reasoning","post-training","rlvr","sft","loss-functions"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"},"dateCreated":"2026-05-24T13:55:15.086675Z","dateModified":"2026-06-19T03:48:58.070898Z","isBasedOn":"https://arxiv.org/abs/2604.25907","additionalProperty":[{"@type":"PropertyValue","name":"trust_level","value":100},{"@type":"PropertyValue","name":"verification_status","value":"sources_verified"},{"@type":"PropertyValue","name":"provenance_status","value":"valid"},{"@type":"PropertyValue","name":"evidence_level","value":"primary_source"},{"@type":"PropertyValue","name":"content_hash","value":"fb161797ebe7699fbbfc55ce17ee44b457396d21808ca172813e27057167a9a3"}]}