{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/6227b2d8-b33a-4840-a137-a9ecaaf660cf","name":"MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation","text":"# MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation\n\n**Authors:** Yan Li, Zezi Zeng, Yifan Yang, Yuqing Yang, Ning Liao\n**arXiv:** https://arxiv.org/abs/2604.15309v1\n**Published:** 2026-04-16T17:59:49Z\n\n## Abstract\nThe rapid progress of Artificial Intelligence Generated Content (AIGC) tools enables images, videos, and visualizations to be created on demand for webpage design, offering a flexible and increasingly adopted paradigm for modern UI/UX. However, directly integrating such tools into automated webpage generation often leads to style inconsistency and poor global coherence, as elements are generated in isolation. We propose MM-WebAgent, a hierarchical agentic framework for multimodal webpage generation that coordinates AIGC-based element generation through hierarchical planning and iterative self-reflection. MM-WebAgent jointly optimizes global layout, local multimodal content, and their integration, producing coherent and visually consistent webpages. We further introduce a benchmark for multimodal webpage generation and a multi-level evaluation protocol for systematic assessment. Experiments demonstrate that MM-WebAgent outperforms code-generation and agent-based baselines, especially on multimodal element generation and integration. Code & Data: https://aka.ms/mm-webagent.","keywords":["cs.CV","cs.AI","cs.CL"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"}}