Experience

Leading the research and development of next-generation, end-to-end Audio LLMs, specializing in advanced speech understanding, interactive systems, and reinforcement learning.

Step-Audio R1 & Step MPS (Project Lead): Ushered in the “Deepseek R1 moment” for audio LLMs by developing China’s #1 speech reasoning model, directly benchmarking Gemini 2.5 Pro in perception and reasoning. Pioneered the revolutionary Step MPS (Mind-Paced Speaking) “dual-brain” framework, a global-first solution that enables complex CoT reasoning and highly empathetic, human-like interaction with zero additional latency, achieving true real-time “thinking-while-speaking.”
Step-Audio 2 (Lead of Speech Understanding): Led the development of the world’s first industrial-grade end-to-end audio LLM with deep thinking capabilities. Introduced Chain-of-Thought (CoT) reasoning and audio reinforcement learning into speech models for the first time, achieving SOTA performance across ASR, paralinguistic understanding (emotion, tone, music), and reasoning tasks for both open-source and proprietary models. [arXiv:2507.16632]
Step EditX (Co-Project Lead): Defined a new paradigm of instruction-based “conversational creation” for audio editing. Developed a groundbreaking model capable of zero-shot TTS, style transfer (30+ styles), emotion enhancement (14+ emotions), and one-click restoration. Achieved the industry’s first semantic-level, context-aware audio editing (insertion, deletion, modification) based on natural language prompts, ensuring perfect preservation of timbre and prosody.

Researcher, Audio LLM

Researcher, Audio LLM

Education

MS Speech Processing

Visiting Scholar

BS Physics and Acoustic