ImProver 2: Iteratively Self-Improving LMs for Neurosymbolic Proof Optimization
Original reporting by arXiv (cs.AI)

The digital landscape of formal mathematics is expanding rapidly, with verified proofs becoming critical in fields from software engineering to AI safety. This growth, however, demands ongoing refactoring for maintainability and to generate high-quality training data for emerging neural provers. Optimizing these proofs is a formidable challenge, hindered by their inherent heterogeneity, sparse training data, and high computational demands for both training and inference. Addressing these hurdles is vital for the future of automated reasoning.
A new approach Researchers have introduced ImProver 2, a novel neurosymbolic framework for automated proof optimization within the Lean 4 ecosystem. At its core, ImProver 2 combines a data-efficient expert-iteration pipeline with an innovative "scaffold" that deftly exposes formal proof structure alongside lightweight informal abstractions. This unique design, complemented by a new suite of metrics for evaluating structural proof properties, allows for unprecedented efficiency. A 7-billion-parameter model trained with ImProver 2 remarkably outperforms significantly larger models within its class, even rivaling the performance of mid-tier frontier systems. The research underscores that with proper scaffolding and training, smaller models can effectively restructure complex, research-level proofs, firmly establishing proof optimization as a scalable and learnable task, accessible even to more modest AI systems.
ImProver 2 represents a significant leap in automated proof optimization. By integrating a neurosymbolic framework with an efficient expert-iteration pipeline and a clever scaffolding mechanism, it has demonstrated that even moderately sized models can effectively restructure complex research-level proofs. This achievement is not just about competing with larger, more resource-intensive systems; it fundamentally redefines proof optimization as a scalable and learnable task. The ability to optimize formal proofs with such efficiency addresses a critical need in maintaining the burgeoning libraries of verified mathematics, making them more manageable and improving the quality of data for future AI provers.