UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation

^* Equal contribution.

^† Corresponding author: jietang@mail.tsinghua.edu.cn

UI2Code^N is a visual language foundation model trained through staged pretraining, fine-tuning, and reinforcement learning to achieve foundational improvements in multimodal coding, which unifies three key capabilities: UI-to-code generation , UI editing , and UI polishing .

Interactive UI-to-Code Paradigm – We redefine UI-to-code generation as iterative reasoning with visual feedback, enabling flexible code generation, editing, and test-time scaling (e.g., +12% improvement with four rounds of polishing).
First open-source Unified UI2Code Model – UI2Code^N is the first open-source VLM to jointly support UI-to-code, UI editing, and UI polishing, achieving state-of-the-art results on Design2Code, Flame-React-Eval, and Web2Code, outperforming Gemini-2.5-Pro and Claude-4-Sonnet.
Full Training Recipe for Coding VLM – We are the first to release a complete three-stage training pipeline—pretraining, supervised fine-tuning, and reinforcement learning with a novel reward design—balancing data realism with code generation quality.

Abstract

User interface (UI) programming is a core yet highly complex part of modern software development. Recent advances in visual language models (VLMs) highlight the potential of automatic UI coding, but current approaches face two key limitations: multimodal coding capabilities remain underdeveloped, and single-turn paradigms make little use of iterative visual feedback. We address these challenges with an interactive UI-to-code paradigm that better reflects real-world workflows and raises the upper bound of achievable performance. Under this paradigm, we present UI2Code^N, a visual language foundation model trained through staged pretraining, fine-tuning, and reinforcement learning to achieve foundational improvements in multimodal coding. The model unifies three key capabilities: UI-to-code generation, UI editing, and UI polishing. We further explore test-time scaling for interactive generation, enabling systematic use of multi-turn feedback. Experiments on UI-to-code and UI polishing benchmarks show that UI2Code^N establishes a new state of the art among open-source models and achieves performance comparable to leading closed-source models such as Claude-4-Sonnet and GPT-5.

Training Recipe

Although recent VLMs have demonstrated substantial progress on general vision benchmarks, their performance on UI coding remains limited due to two main challenges.

First, the inherent difficulty of UI coding: the model must perceive UI-style images with fine-grained details such as icons, fonts, and layout structures—quite different from natural images used in conventional pretraining. Moreover, UI code (HTML/CSS/JavaScript) can exceed 10,000 tokens and requires precise alignment between image and code at both global and element-level.

Second, the limitations of available training data. Real webpages provide rich diversity but contain noisy HTML tied to external resources, making them unsuitable for direct supervision. In contrast, synthetic or pruned datasets are clean but lack real-world complexity. As a result, prior models mainly rely on synthetic data, leaving large-scale real website data underutilized and limiting performance in practical applications.

We design a three-stage training pipeline:

Stage 1 — Continual Pre-training: Train on large-scale real webpage image–HTML pairs to build broad UI coding foundations.

Stage 2 — Supervised Fine-tuning: Use curated clean datasets to enhance core capabilities: UI-to-code generation, UI editing, and UI polishing.

Stage 3 — Reinforcement Learning: Adapt the model to complex real-world webpages using a verifier-based reward without paired ground-truth HTML.

Introduction

We propose a novel Interactive UI-to-Code paradigm that fundamentally departs from prior single-turn generation approaches, redefining UI-to-code as an iterative and interactive process of generation, editing, and polishing. Such paradigm provides flexible usage with enhanced performance and enables test-time scaling in UI-to-code generation.

Guided by this paradigm, we present UI2Code^N, a powerful visual language model trained via a three-stage training pipeline: large-scale pretraining on noisy real-world data to build broad multimodal foundations, supervised fine-tuning on synthetic datasets to improve code quality, and reinforcement learning with a carefully designed verifier to exploit unpaired real webpages while maintaining generation fidelity.

Experimental results demonstrate that UI2Code^N achieves state-of-the-art performance in UI coding. Building upon the core task of UI-to-code, UI2Code^N further extends its capabilities to UI polishing and UI editing.

Demo Cases

1. UI-to-Code Generation Demo

Generate clean, executable HTML/CSS/JavaScript code directly from UI screenshots with high fidelity.

Reference Screenshot

→

Generated HTML Code 👁️ View Live Demo

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Amazon Homepage</title>
    <style>
        * {
            margin: 0;
            padding: 0;
            box-sizing: border-box;
        }
        
        body {
            font-family: Arial, sans-serif;
            background: #e3e6e6;
        }
        
        .header {
            background: #131921;
            color: white;
            padding: 10px 20px;
            display: flex;
            align-items: center;
            gap: 20px;
        }
        
        .search-bar {
            flex: 1;
            display: flex;
            max-width: 800px;
        }
        
        .hero-banner {
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            padding: 40px;
            text-align: center;
            color: white;
        }
        
        .product-grid {
            display: grid;
            grid-template-columns: repeat(auto-fit, minmax(300px, 1fr));
            gap: 20px;
            padding: 20px;
        }
    </style>
</head>
<body>
    <!-- Header content -->
    <div class="header">
        <div class="logo">Amazon</div>
        <div class="search-bar">
            <input type="text" placeholder="Search Amazon">
        </div>
    </div>
    
    <!-- Hero Banner -->
    <div class="hero-banner">
        <h1>Prime Big Deal Days</h1>
        <p>Members unlock early deals</p>
    </div>
    
    <!-- Product Grid -->
    <div class="product-grid">
        <!-- Product cards here -->
    </div>
</body>
</html>
        

→

Rendered Result

2. UI Editing Demo

Modify existing UI code based on natural language instructions and visual context.

Reference Screenshot

💬 Prompt:

"change the background color of the original UI to purple."

→

Generated HTML Code 👁️ View Live Demo

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Amazon Homepage</title>
    <style>
        * {
            margin: 0;
            padding: 0;
            box-sizing: border-box;
        }
        
        body {
            font-family: Arial, sans-serif;
            background: #e3e6e6;
        }
        
        .header {
            background: #131921;
            color: white;
            padding: 10px 20px;
            display: flex;
            align-items: center;
            gap: 20px;
        }
        
        .search-bar {
            flex: 1;
            display: flex;
            max-width: 800px;
        }
        
        .hero-banner {
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            padding: 40px;
            text-align: center;
            color: white;
        }
        
        .product-grid {
            display: grid;
            grid-template-columns: repeat(auto-fit, minmax(300px, 1fr));
            gap: 20px;
            padding: 20px;
        }
    </style>
</head>
<body>
    <!-- Header content -->
    <div class="header">
        <div class="logo">Amazon</div>
        <div class="search-bar">
            <input type="text" placeholder="Search Amazon">
        </div>
    </div>
    
    <!-- Hero Banner -->
    <div class="hero-banner">
        <h1>Prime Big Deal Days</h1>
        <p>Members unlock early deals</p>
    </div>
    
    <!-- Product Grid -->
    <div class="product-grid">
        <!-- Product cards here -->
    </div>
</body>
</html>
        

→

Rendered Result

3. UI Polishing Demo

Iteratively refine UI layout, spacing, typography, and aesthetics to match design style.

Reference Screenshot

→

First Generated HTML Code 👁️ View Live Demo

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Basic Layout</title>
    <style>
        body {
            font-family: Arial;
            margin: 20px;
        }
        
        .header {
            background: #333;
            color: white;
            padding: 20px;
        }
        
        .content {
            margin-top: 20px;
        }
    </style>
</head>
<body>
    <div class="header">
        <h1>Header</h1>
    </div>
    <div class="content">
        <p>Content here</p>
    </div>
</body>
</html>
                

→

First Rendered Result

⬇️

Iterative Polishing

💬 Polishing Prompt:

"Please compare the design draft and the rendered screenshot in detail, pointing out the differences in layout, color scheme, typography, fonts, spacing, component styles, and interactive effects. Based on this, modify the HTML and CSS to output a complete and runnable HTML file, making the rendered result closer to the target UI."

→

Polished HTML Code 👁️ View Live Demo

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Polished Layout</title>
    <style>
        * {
            margin: 0;
            padding: 0;
            box-sizing: border-box;
        }
        
        body {
            font-family: 'Segoe UI', system-ui, sans-serif;
            background: linear-gradient(135deg, #f5f7fa 0%, #c3cfe2 100%);
            padding: 40px;
        }
        
        .header {
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            color: white;
            padding: 30px 40px;
            border-radius: 12px;
            box-shadow: 0 10px 30px rgba(0,0,0,0.2);
        }
        
        .content {
            margin-top: 30px;
            padding: 30px;
            background: white;
            border-radius: 12px;
            box-shadow: 0 5px 15px rgba(0,0,0,0.1);
        }
    </style>
</head>
<body>
    <div class="header">
        <h1 style="font-size: 32px; font-weight: 600;">Modern Header</h1>
    </div>
    <div class="content">
        <p style="line-height: 1.8; color: #333;">Beautifully styled content</p>
    </div>
</body>
</html>
                

→

Polished Result

Experimental Results

We evaluate UI2Code^N on two major tasks: (1) UI-to-Code generation and (2) UI Polishing. Results demonstrate that UI2Code^N consistently achieves state-of-the-art performance among all open-source models and is competitive with leading closed-source VLMs.

UI2Code^N outperforms GPT-4o, Claude 3.7/4 Sonnet, and Gemini-2.5 in most UI coding benchmarks and approaches GPT-5 in overall performance. Our RL-tuned version UI2Code^N-9B-RL achieves the best overall open-source performance and even surpasses some commercial models.

Model	UI-to-Code				UI Polishing
Model	Design2Code	Flame	Web2Code	UI2Code-Real	UIPolish-Real	UIPolish-Synthetic
Open-source VLM
InternVL3-9B	15.3	11.3	12.3	16.5	4.0	7.0
InternVL3-78B	30.0	51.3	45.5	30.4	10.0	15.0
Qwen2.5-VL-7B	29.1	25.0	37.2	26.1	11.0	14.0
Qwen2.5-VL-72B	41.9	46.3	64.1	40.9	23.0	38.0
MiMo-VL-7B-SFT	28.3	10.0	44.3	33.9	17.0	33.0
MiMo-VL-7B-RL	28.7	8.8	38.3	30.4	16.0	30.0
Kimi-VL-A3B-Instruct	27.3	50.0	69.1	26.1	14.0	40.0
Kimi-VL-A3B-Thinking	38.8	36.3	46.6	27.0	14.0	27.0
GLM-4.1V-9B-Thinking	64.7	72.5	71.3	53.0	42.0	46.0
Closed-source VLM
Claude-4-Sonnet-thinking	81.2	76.3	85.1	63.5	78.0	65.0
Claude-3.7-Sonnet-thinking	77.7	80.0	73.3	55.8	75.0	62.0
GPT-5	89.7	91.3	93.7	67.8	85.0	68.0
GPT-4o	35.3	75.0	62.7	21.7	26.0	14.0
o4-mini	63.8	83.8	77.9	59.1	65.0	65.0
Gemini-2.5-Pro	89.5	87.5	90.6	68.7	74.0	68.0
Gemini-2.5-Flash	70.5	72.5	85.7	62.6	17.0	24.0
Doubao-1.5-thinking-vision	53.7	78.8	55.6	38.3	51.0	61.0
Doubao-1.6-thinking-250715	62.4	67.7	67.2	43.4	61.0	67.0
UI2Code^N
UI2Code^N-9B-SFT	79.3	85.0	80.8	67.0	76.0	89.0
UI2Code^N-9B-RL	88.6	95.0	92.5	76.5	80.0	94.0

UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation

Abstract

Method Overview

Training Recipe

Introduction

Demo Cases

1. UI-to-Code Generation Demo

2. UI Editing Demo

3. UI Polishing Demo

Experimental Results

Citation

Core Authors & Corresponding Authors

Zhen Yang

Wenyi Hong

Jie Tang