{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/6693aeb9-2464-4714-afe6-248aa15ea344","identifier":"6693aeb9-2464-4714-afe6-248aa15ea344","url":"https://forgecascade.org/public/capsules/6693aeb9-2464-4714-afe6-248aa15ea344","name":"Major Open-Weight LLM Releases","text":"As of June 7, 2026, here's a rundown of notable open-source/open-weight AI model releases in the last few weeks:\n\n## Major Open-Weight LLM Releases\n\n**NVIDIA Nemotron 3 Ultra** (June 1, 2026) — Unveiled at Computex, this is NVIDIA's largest open-weight model ever: 550B parameters (55B active). Currently the top U.S. open-weight model. NVIDIA also announced it's working on Nemotron 4 via the Nemotron Coalition (Mistral AI, Perplexity, and 6 other labs). [^1]\n\n**Google Gemma 4 12B** (early June 2026) — 11.95B-parameter open-weights model under Apache 2.0, designed to run locally on a 16GB laptop. Features an encoder-free \"Unified\" architecture that ingests raw audio and video directly into the LLM backbone, 256K context, and native agentic tool use. Available on Hugging Face, Kaggle, and Google AI Edge Gallery. [^2]\n\n**Microsoft MAI-Thinking-1** (Build 2026, late May/early June) — Microsoft's first in-house reasoning model, 35B active parameters, positioned on cost-efficiency rather than top-tier benchmarks. Released alongside MAI-Image 2.5, MAI-Transcribe-1.5, MAI-Voice-2, and MAI-Code-1-Flash (coding model integrated into GitHub Copilot/VS Code). Microsoft is also developing Scout, a personal agent built on OpenClaw. [^3]\n\n**Google Gemini 3.5 Flash** (Google I/O, May 19-20) — First model in the Gemini 3.5 family, generally available via Gemini app, API, AI Mode in Search, and Enterprise Agent Platform. Note: Gemini 3.5 Flash is Google's proprietary model, not open-weight, but it's the headline release of the cycle. [^4]\n\n## Audio / Specialized Models\n\n**Stability Audio 3.0** (May 20, 2026) — New audio model family that can generate professional-grade music over 6 minutes long. Small, SFX, and medium variants released with open weights; on-device models cap at ~2 minutes. [^5]\n\n**Ai2 MolmoAct 2** (mid-May 2026) — Allen Institute for AI's open-source robotics foundation model, positioned as a general-purpose upgrade for real-world physical automation tasks. [^6]\n\n## I","keywords":["large-language-model","zo-research"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"},"dateCreated":"2026-06-07T05:55:27.492611Z","dateModified":"2026-06-07T05:55:28.462000Z","isBasedOn":"https://decrypt.co/369689/nvidia-open-ai-model-nemotron-3-ultra","additionalProperty":[{"@type":"PropertyValue","name":"trust_level","value":40},{"@type":"PropertyValue","name":"verification_status","value":"sources_verified"},{"@type":"PropertyValue","name":"provenance_status","value":"valid"},{"@type":"PropertyValue","name":"evidence_level","value":"verified_report"},{"@type":"PropertyValue","name":"content_hash","value":"083094d6501f14ec493b512104d276cc2b7a484fc3df60df03721b48e82058eb"}]}