No description
- Python 79%
- Shell 21%
| data_analysis | ||
| dataset | ||
| .gitignore | ||
| README.md | ||
| requirements.txt | ||
WebUI95 : UI-to-Code Generation and Analysis
This repository contains the code and data for the Web++ project, which evaluates UI-to-code generation models.
Repository Structure
web++/
├── WebUI95/ # UI-to-Code Experiment Pipeline
│ ├── output/ # Final generated HTML (63,824 samples)
│ │ └── final_html/ # Generated static HTML files
│ ├── final_results_64k.zip # Compressed final results
│ ├── slurm/ # SLURM job scripts
│ └── *.py # Processing scripts
│
├── data_analysis/ # Quality and Diversity Analysis
│ ├── analysis/ # Analysis results
│ │ ├── *.json # Embedding and metric files
│ │ └── plots/ # Generated figures
│ ├── webgen_bench/ # WebGen-Bench baseline
│ │ └── fixed_samples.zip # Fixed prompt experiments (475 samples)
│ ├── scripts/ # Analysis scripts
│ └── slurm/ # SLURM job scripts
│
└── venv/ # Python virtual environment
Datasets
WebUI95 Dataset
- Location:
WebUI95/output/final_html/orWebUI95/final_results_64k.zip - Size: 63,824 UI samples
- Format: Each sample contains generated static HTML/CSS
WebGen-Bench Fixed Prompt Samples
- Location:
data_analysis/webgen_bench/fixed_samples.zip - Size: 475 samples
- Purpose: Control experiment for diversity analysis (same prompt, varying seeds)
Key Scripts
WebUI95 Pipeline
| Script | Description |
|---|---|
run_inference_multigpu.py |
Multi-GPU inference for UI-to-code generation |
process_unified_data.py |
Data preprocessing |
merge_outputs.py |
Merge outputs from multiple workers |
Data Analysis
| Script | Description |
|---|---|
scripts/uiclip_analysis.py |
UIClip quality and similarity scoring |
scripts/compute_code_embeddings.py |
Qwen code embeddings for diversity |
scripts/generate_figures.py |
Generate analysis figures |
scripts/html_statistics.py |
DOM complexity analysis |
Results
Visual Quality (UIClip)
| Metric | Original | Generated |
|---|---|---|
| Mean Quality Score | 0.487 | 0.518 |
| Mean Similarity | - | 0.938 |
Diversity Analysis
| Type | Original | Generated | Synthetic (Fixed) |
|---|---|---|---|
| Visual | 0.150 | 0.137 | 0.140 |
| Code | 0.119 | 0.049 | 0.009 |
Key Finding: The synthetic fixed-prompt experiment shows near-zero code diversity (0.009), confirming mode collapse when using identical prompts. Our generated approach maintains significantly better diversity.
Figures
Generated figures are in data_analysis/analysis/plots/:
figure2_html_statistics.png- HTML structure comparisonfigure3_combined_diversity_comparative.png- Diversity analysis
Requirements
- Python 3.10+
- PyTorch 2.0+
- Transformers
- Playwright (for screenshot generation)
- UIClip model:
biglab/uiclip_jitteredwebsites-2-224-paraphrased_webpairs_humanpairs - Qwen models:
Qwen/Qwen3-Coder-30B-A3B-Instruct,Qwen/Qwen2.5-Coder-1.5B
Compute Environment
- GPU: NVIDIA A40 (48GB VRAM)
- Partitions:
gpu_a40,cpu_sapphire
Citation
If you use this code or data, please cite:
@inproceedings{webuininetyfive2026,
title={WebUI-95: A Large-Scale Dataset of Normalized Web Interfaces via UI-to-Code Generation},
author={...},
booktitle={CHI Posyets},
year={2026}
}
Acknowledgments
Computational resources provided by institutional HPC cluster.