person profile

Congrui Du

Congrui Du — researcher or builder tracked in the Angestrom contributor network.

4Connections

1Papers

0Models

0Repos

0News

Papers · 1

Unlocking Speech-Text Compositional Powers: Instruction-Following Speech Language Models without Instruction Tuning

Instruction tuning for speech language models (SLMs) is substantially more challenging than for text-based large language models (LLMs), as it requires learning a new modality and a wide range of speech-specific instructions in addition to those supported by text LLMs. Existing SLM training approaches largely replicate the text LLM training paradigm by synthesizing large-scale speech pre-training and instruction-tuning datasets. However, this strategy is difficult to scale, since speech sequences are significantly longer than text sequences. In this paper, we propose SpeechCombine, an instruct