Mathematics & AI

Mathematics & AI

ISSN: 0000-0000 · EN

Mathematics & AI is an open-access, peer-reviewed journal at the intersection of mathematics and artificial intelligence. The journal publishes original research in mathematical foundations of AI, machine learning theory, optimization, statistical learning, neural network analysis, computational mat...
Article #1015
Issue MathAI 2026 Selected Papers Special Issue
Received 15 Apr 2026
Accepted 15 May 2026
Published 22 May 2026

A Systematic Study of Gate Functions in Soft Adaptive Policy Optimization

MathAI 2026 Selected Papers Special Issue
Published: May 22, 2026 Accepted: May 15, 2026 Received: April 15, 2026

Abstract

Group Relative Policy Optimization (GRPO) has significantly advanced the training of large language models and enhanced their reasoning capabilities, while it remains susceptible to instability due to the use of hard clipping. Soft Adaptive Policy Optimization (SAPO) addresses this limitation by replacing clipping with a smooth sigmoid-based gate function, which leads to more stable updates. We push this theory further and investigate the impact of different gate functions on both training stability and final model performance. We formalize the key properties that admissible gates should satisfy and propose several families of such functions for empirical evaluation. This paper presents an analysis of our findings based on experiments conducted with the Qwen2.5-7B-Instruct model on mathematical reasoning tasks. These results provide practical guidance for designing smoother and more robust policy optimization objectives for large language model training.

Cite this article

Denisov, E.; Glazyrina, S.; Kryzhanovskiy, M.; Ischenko R. A Systematic Study of Gate Functions in Soft Adaptive Policy Optimization. Mathematics & AI 2026, 1, 12. https://enigma.ist/j/mathematics-ai/1/2/12

Full Text (PDF)