🚀

Fei Tian

Audio LLM Researcher

Professional Summary

Fei Tian is an Audio LLM Researcher at StepFun, pioneering the next generation of speech AI. He was instrumental in developing groundbreaking projects including Step-Audio, Step-Audio 2, Step-Audio R1, and Step-MPS. His work introduced China’s leading speech reasoning model (benchmarking Gemini 2.5 Pro), the revolutionary “thinking-while-speaking” framework, and the integration of Chain-of-Thought (CoT) reasoning into the world’s first industrial-grade audio LLM. Previously at ByteDance, he spearheaded the architectural evolution of speech models for core products like TikTok and CapCut. Fei is passionately committed to contributing his expertise to the journey toward Artificial General Intelligence.

Education

MS Speech Processing

Nanjing University

Visiting Scholar

University of Technology Sydney

BS Physics and Acoustic

Nanjing University

Interests

Speech Understanding Interactive Speech Systems Speech Generation Reinforcement Learning in Speech Large Language Models

Featured Publications

Chronological Thinking in Full-Duplex Spoken Dialogue Language Models

Abstract Recent advances in spoken dialogue language models (SDLMs) reflect growing interest in shifting from turn-based to full-duplex systems, where the models continuously …

Fei Tian

• Oct 1, 2025 • 1 min read

Step-Audio 2 Technical Report

Abstract This paper presents Step-Audio 2, an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation. By integrating …

Fei Tian

• Jul 1, 2025 • 1 min read

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

Abstract Large Audio-Language Models (LALMs) have significantly advanced intelligent human-computer interaction, yet their reliance on text-based outputs limits their ability to …

Fei Tian

• Jun 1, 2025 • 1 min read

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

Abstract Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face …

Fei Tian

• Feb 1, 2025 • 1 min read

Recent Publications

Fei Tian (2025). [Coming soon!] Step MPS: Mind-Paced Speaking Framework.

Fei Tian (2025). [Coming soon!] Step-Audio R1: Achieving Gemini 2.5 Pro-Level Reasoning in a Zero-Latency Speech LLM.

Fei Tian (2025). [Coming soon!] Step EditX: Next-Generation Conversational Speech Editing Model.

Experience

Researcher, Audio LLM
StepFun October 2024 – Present
Leading the research and development of next-generation, end-to-end Audio LLMs, specializing in advanced speech understanding, interactive systems, and reinforcement learning.
- Step-Audio R1 & Step MPS (Project Lead): Ushered in the “Deepseek R1 moment” for audio LLMs by developing China’s #1 speech reasoning model, directly benchmarking Gemini 2.5 Pro in perception and reasoning. Pioneered the revolutionary Step MPS (Mind-Paced Speaking) “dual-brain” framework, a global-first solution that enables complex CoT reasoning and highly empathetic, human-like interaction with zero additional latency, achieving true real-time “thinking-while-speaking.”
- Step-Audio 2 (Lead of Speech Understanding): Led the development of the world’s first industrial-grade end-to-end audio LLM with deep thinking capabilities. Introduced Chain-of-Thought (CoT) reasoning and audio reinforcement learning into speech models for the first time, achieving SOTA performance across ASR, paralinguistic understanding (emotion, tone, music), and reasoning tasks for both open-source and proprietary models. [arXiv:2507.16632]
- Step EditX (Co-Project Lead): Defined a new paradigm of instruction-based “conversational creation” for audio editing. Developed a groundbreaking model capable of zero-shot TTS, style transfer (30+ styles), emotion enhancement (14+ emotions), and one-click restoration. Achieved the industry’s first semantic-level, context-aware audio editing (insertion, deletion, modification) based on natural language prompts, ensuring perfect preservation of timbre and prosody.
Researcher, Audio LLM
ByteDance July 2019 – September 2024
- Led the R&D of the subtitle generation system for core products including TikTok, Douyin, CapCut and Jianying, serving millions of daily requests.
- Spearheaded the integration of Seed Audio LLM to enhance features like personalized captions, context-aware adaptation, and text normalization (ITN).
- Architected the evolution of speech models from Transformer to Seed Audio LLM, reducing error rates by over 20% and increasing user satisfaction scores to 4.5/5.0.
- Pioneered a multi-modal speech translation model based on Seed Audio LLM that outperformed Google, Gemini & Qwen, raising dubbing quality scores from 40% to over 85%.

Education

MS Speech Processing
Nanjing University September 2016 – June 2019
Visiting Scholar
University of Technology Sydney September 2017 – March 2018
BS Physics and Acoustic
Nanjing University September 2012 – June 2016

Selected Projects

Audio LLM

Step-Audio 2

The world's first industrial-grade end-to-end audio LLM with deep thinking capabilities, achieving SOTA performance across multiple understanding and dialogue tasks.

Jul 23, 2025 • 1 min read

Fei Tian

Professional Summary

Education

Interests

Chronological Thinking in Full-Duplex Spoken Dialogue Language Models

Step-Audio 2 Technical Report

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

Experience

Researcher, Audio LLM

Researcher, Audio LLM

Education

MS Speech Processing

Visiting Scholar

BS Physics and Acoustic

Step-Audio 2