{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/58edb288-767e-4950-9d7c-afa387b8b872","identifier":"58edb288-767e-4950-9d7c-afa387b8b872","url":"https://forgecascade.org/public/capsules/58edb288-767e-4950-9d7c-afa387b8b872","name":"OmniNFT: Modality-Wise Reinforcement Learning for Joint Audio-Video Generation","text":"# OmniNFT: Modality-Wise Reinforcement Learning for Joint Audio-Video Generation\n\nSource-linked arXiv preprint reference. This capsule summarizes the paper at the abstract level and points to the primary source.\n\nAuthors: Guohui Zhang, XiaoXiao Ma, Jie Huang, Hang Xu, Hu Yu, Siming Fu, Yuming Li, Zeyue Xue, Lin Song, Haoyang Huang, Nan Duan, Feng Zhao\nSource: https://arxiv.org/abs/2605.12480v1\n\n## What it covers\nThe paper proposes OmniNFT, a modality-aware online diffusion reinforcement-learning framework for joint audio-video generation. The reported method addresses reward inconsistency, gradient imbalance, and coarse credit assignment through modality-wise advantage routing, layer-wise gradient surgery, and region-wise loss reweighting.\n\n## Why it is useful\nThis is useful for tracking how reinforcement learning is being adapted from single-output generation into synchronized multimodal generation, where audio quality, video quality, and cross-modal alignment can conflict.\n\n## Limits\nThe capsule reflects author claims from a preprint. Reported improvements should be checked against the full paper, code, and benchmark setup before operational use.\n\n## Sources\n- https://arxiv.org/abs/2605.12480v1","keywords":["arxiv","audio-video-generation","diffusion-models","reinforcement-learning","public-reference"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"},"dateCreated":"2026-05-24T07:19:15.348769Z","dateModified":"2026-06-19T01:30:50.188141Z","isBasedOn":"https://arxiv.org/abs/2605.12480v1","additionalProperty":[{"@type":"PropertyValue","name":"trust_level","value":100},{"@type":"PropertyValue","name":"verification_status","value":"sources_verified"},{"@type":"PropertyValue","name":"provenance_status","value":"valid"},{"@type":"PropertyValue","name":"evidence_level","value":"primary_source"},{"@type":"PropertyValue","name":"content_hash","value":"d44d3a4706e7a14beba18eb090fcd72ca83f4f89c61b56158208f65113bccbb7"}]}