Why self-evolving?
Static GPT-4o distillation underfits rare error cases and diagram-intensive prompts. MathSE pairs ORM feedback with revised reasoning traces so the training corpus grows alongside model capability.
Headline results
- +2.1 average gain on MathVL-test across CogVLM2, Qwen2-VL-7B, InternVL2.5-8B.
- ORM detects the first failure step 81% of the time, keeping critiques actionable.