About

I’m a researcher focused on how paraverbal acoustic variation—such as emotion, prosody, non-verbal vocalization—is encoded inside audio/speech foundation models and how those representations influence semantic prediction and reasoning in Speech-LLMs.

I focus on quantifying representation sensitivity, invariance, and leakage. My recent work includes (1) benchmarking acoustic representation leakage in Speech-LLMs (VoiceBBQ), and (2) developing disentangled and robust speech modeling components as tools for controlled analysis (FairSLM, VorTEX).

Research Interests

Representation Analysis in Speech Foundation Models: Probing what is encoded across layers (emotion, prosody, speaker identity) and how it surfaces in downstream decisions.
Controlled Evaluation for Speech Systems: Designing attribute-controlled evaluation to measure invariance, sensitivity, and leakage under paraverbal variation.
Robust Speech Systems as Representation Tools: Using TSE and other robust front-ends to analyze how overlap/noise perturb internal representations and semantic stability.
Disentangled Audio Representation: Learning attribute-invariant latent spaces for controllability and generalization under stylistic changes.

Education

B.S. in Computer Science and Engineering, Chung-Ang University (Mar 2019 – Aug 2025)

Featured Publications

VoiceBBQ (co-author, EMNLP 2025 Main)
- Benchmark for diagnosing content vs. acoustic social bias (and representation leakage) in Speech-LLMs via attribute-controlled minimal pairs.
FairSLM: Mitigating Acoustic Social Bias via Disentangled Speech Representations (first author, Manuscript under review)
- Disentangles paraverbal acoustic attributes from semantic content to improve robustness to stylistic variation and associated social bias.
VorTEX: Various Overlap Ratio for Target Speech Extraction (co-first author, Manuscript under review)
- Robust TSE architecture designed to generalize across diverse overlap ratios and low-SNR conditions in multi-speaker environments.
Acoustic-based Gender Differentiation in Speech-aware Language Models (co-author, Submitted to TACL 2026)
- Systematic analysis of how identical prompts can elicit different responses depending on speaker gender cues.

Selected Projects

Korean Acting-Tone Emotional/Paraverbal Speech Corpus (Assistant)
- Built an acted corpus with controlled linguistic content to isolate paraverbal variation; expanded with TTS/VC augmentation and curated contrastive pairs for probing/evaluation.
AudioPoli: Auditory Emergency-Scene Classification (Google Solution Challenge Global Top 100)
- Developed an environmental sound classification system for emergency situation detection.
AULO: Short-length Sound Effect Retrieval
- Implemented a sound effect retrieval system using VAE-based acoustic encoders.

Awards & Honors

Solution Challenge – Global Top 100, Google Developers Groups (2024)
Full-tuition Merit-based Scholarship, Chung-Ang University (2019 – 2025)

Skills

Languages: Python, C/C++, Bash, CUDA (basic), SQL
Frameworks: PyTorch, TensorFlow, Keras, PyTorch Lightning
Toolkits: HuggingFace, Librosa, Amphion, OpenAI Whisper, ESPnet
Infra: Git, Docker, GCP, Linux, LaTeX

Contact

Email: seoljh301@gmail.com / seoljh0722@cau.ac.kr
GitHub: github.com/seoljh301

Jihwan Seol