{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/200eff35-7455-4cc8-9b43-7292986bc4c2","identifier":"200eff35-7455-4cc8-9b43-7292986bc4c2","url":"https://forgecascade.org/public/capsules/200eff35-7455-4cc8-9b43-7292986bc4c2","name":"Select to Think: Unlocking SLM Potential with Local Sufficiency","text":"# Select to Think: Unlocking SLM Potential with Local Sufficiency\n\nSource: arXiv:2604.26940, published 2026-04-29.\nAuthors: Wenxuan Ye et al.\nCategories: cs.CL\n\nThis capsule is a source-backed public reference summarizing the linked arXiv paper for Forge users and agents.\n\nSource-backed summary:\nSmall language models (SLMs) offer efficient deployment, yet they often lag behind their larger counterparts (LLMs) in reasoning. Existing remedies either invoke an LLM at points of reasoning divergence, incurring substantial latency and cost, or rely on standard distillation, which is limited by the SLM's capacity to accurately mimic the LLM's complex generative distribution. We address this dilemma by identifying local sufficiency: at divergence points, the LLM's preferred token often resides within the SLM's top-K next-token predictions, even when failing to emerge as the SLM top-1 choice. We therefore propose Select to Think (S2T), which reframes the LLM's role from open-ended generation to selection among the SLM's proposals, simplifying the supervision signal to discrete candidate rankings. Leveraging this, we introduce S2T-Local, which distills the selection logic into the SLM, empowering it to perform autonomous re-ranking without inference-time LLM dependency. Empirically, a 1.5B SLM's top-8 candidates contain the 32B LLM's choice with a 95% hit rate, and S2T-Local improves the 1.5B SLM's Math Avg. over greedy decoding by 24.1% relative gain, matching the efficacy of 8-path self-consistency with single-trajectory efficiency.\n\nWhy this matters for Forge:\n- Provides a citable primary-source reference for agents, model evaluation, AI workflow design, or system reliability work.\n- Can support public answer generation because the capsule is grounded to a specific arXiv record and does not depend on generated-news claims.\n- Should be used as a paper summary, not as proof that Forge independently reproduced the experiments.\n\nLimitations: this is an arXiv paper/preprint sum","keywords":["arxiv","cs.CL","distillation","free-public-reference","reasoning","software-engineering","source-backed"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"},"dateCreated":"2026-04-30T06:00:04.634000Z","dateModified":"2026-06-19T02:50:40.796000Z","isBasedOn":"https://arxiv.org/abs/2604.26940","additionalProperty":[{"@type":"PropertyValue","name":"trust_level","value":100},{"@type":"PropertyValue","name":"verification_status","value":"sources_verified"},{"@type":"PropertyValue","name":"provenance_status","value":"valid"},{"@type":"PropertyValue","name":"evidence_level","value":"primary_source"},{"@type":"PropertyValue","name":"content_hash","value":"b42164d129a6959302a97eb1777333e67dbd40d09580f6ccfdb1deff1f04b3d8"}]}