[Coming soon!] Step-Audio R1: Achieving Gemini 2.5 Pro-Level Reasoning in a Zero-Latency Speech LLM
Oct 1, 2025ยท
ยท
1 min read

Fei Tian
Abstract
Step-Audio R1 represents the Deepseek R1 moment for speech large models, creating China’s first leading speech reasoning model with perception and reasoning capabilities that fully match Gemini 2.5 Pro. By integrating our proprietary Step MPS framework, we have achieved a world-first innovation: endowing the model with sophisticated reasoning capabilities and highly human-like interactive intelligence without adding any additional latency, truly realizing zero time gap between thinking and responding.