[Coming soon!] Step-Audio R1: Achieving Gemini 2.5 Pro-Level Reasoning in a Zero-Latency Speech LLM

Oct 1, 2025·

Fei Tian

· 1 min read

Abstract

Step-Audio R1 represents the Deepseek R1 moment for speech large models, creating China’s first leading speech reasoning model with perception and reasoning capabilities that fully match Gemini 2.5 Pro. By integrating our proprietary Step MPS framework, we have achieved a world-first innovation: endowing the model with sophisticated reasoning capabilities and highly human-like interactive intelligence without adding any additional latency, truly realizing zero time gap between thinking and responding.

Last updated on Oct 1, 2025