Oli Ataeva, Khalov A., Tuchkova N.
We investigate whether structured knowledge retrieval from a mathematical library's dependency graph can improve neural theorem proving at inference time while maintaining explainability of the retrieved context. Through a controlled ablation study on the miniF2F-Test benchmark (197~tasks, 8~non-chain-of-thought modes, $K{=}8$ attempts per mode), we find that graph-based retrieval augmentation significantly improves proof generation on hard problems -- those where the base model's parametric knowledge is insufficient -- while having no measurable effect on problems the model already solves reliably. On 109~hard tasks, the best graph-augmented mode nearly triples the baseline success rate (pass@1: $3.56\% \to 10.42\%$, $+6.8$~percentage points, $p{=}0.001$, Wilcoxon signed-rank test). Deterministic pattern-based graph entry points outperform LLM-generated ones ($11\times$ faster, higher accuracy). The retrieved graph context is fully traceable: each hint maps to a specific edge in the Mathlib dependency graph, enabling the user to verify \emph{why} particular lemmas were suggested. The approach is training-free, composable with any base prover, and adds negligible computational overhead.
Oli Ataeva, Tuchkova
The subject of the study is the problem of adapting language models to scientific subject areas. The issues of expanding language models to mathematical subject areas are considered. It is proposed to use the knowledge graph of the subject area as a tool for `tuning' the language model. To build the knowledge graph, the ontology of the subject area of he semantic library of mathematical subject areas and their applications LibMeta is used.
Navigation through the subject area is carried out using the knowledge graph and is limited
by the terminology of the thesaurus and ontology links. This approach allows using
the knowledge graph to create a digital assistant in a recommender system, an agent for a
language model, and to feed mathematical text data to a language model.