About
I’m a researcher focused on how paraverbal acoustic variation—such as emotion, prosody, non-verbal vocalization—is encoded inside audio/speech foundation models and how those representations influence semantic prediction and reasoning in Speech-LLMs.
I focus on quantifying representation sensitivity, invariance, and leakage. My recent work includes (1) benchmarking acoustic representation leakage in Speech-LLMs (VoiceBBQ), and (2) developing disentangled and robust speech modeling components as tools for controlled analysis (FairSLM, VorTEX).
Research Interests
- Representation Analysis in Speech Foundation Models: Probing what is encoded across layers (emotion, prosody, speaker identity) and how it surfaces in downstream decisions.
- Controlled Evaluation for Speech Systems: Designing attribute-controlled evaluation to measure invariance, sensitivity, and leakage under paraverbal variation.
- Robust Speech Systems as Representation Tools: Using TSE and other robust front-ends to analyze how overlap/noise perturb internal representations and semantic stability.
- Disentangled Audio Representation: Learning attribute-invariant latent spaces for controllability and generalization under stylistic changes.
Education
- B.S. in Computer Science and Engineering, Chung-Ang University (Mar 2019 – Aug 2025)
Featured Publications
- VoiceBBQ (co-author, EMNLP 2025 Main)
- Benchmark for diagnosing content vs. acoustic social bias (and representation leakage) in Speech-LLMs via attribute-controlled minimal pairs.
- FairSLM: Mitigating Acoustic Social Bias via Disentangled Speech Representations (first author, Manuscript under review)
- Disentangles paraverbal acoustic attributes from semantic content to improve robustness to stylistic variation and associated social bias.
- VorTEX: Various Overlap Ratio for Target Speech Extraction (co-first author, Manuscript under review)
- Robust TSE architecture designed to generalize across diverse overlap ratios and low-SNR conditions in multi-speaker environments.
- Acoustic-based Gender Differentiation in Speech-aware Language Models (co-author, Submitted to TACL 2026)
- Systematic analysis of how identical prompts can elicit different responses depending on speaker gender cues.
Selected Projects
- Korean Acting-Tone Emotional/Paraverbal Speech Corpus (Assistant)
- Built an acted corpus with controlled linguistic content to isolate paraverbal variation; expanded with TTS/VC augmentation and curated contrastive pairs for probing/evaluation.
- AudioPoli: Auditory Emergency-Scene Classification (Google Solution Challenge Global Top 100)
- Developed an environmental sound classification system for emergency situation detection.
- AULO: Short-length Sound Effect Retrieval
- Implemented a sound effect retrieval system using VAE-based acoustic encoders.
Awards & Honors
- Solution Challenge – Global Top 100, Google Developers Groups (2024)
- Full-tuition Merit-based Scholarship, Chung-Ang University (2019 – 2025)
Skills
- Languages: Python, C/C++, Bash, CUDA (basic), SQL
- Frameworks: PyTorch, TensorFlow, Keras, PyTorch Lightning
- Toolkits: HuggingFace, Librosa, Amphion, OpenAI Whisper, ESPnet
- Infra: Git, Docker, GCP, Linux, LaTeX
Contact
- Email:
seoljh301@gmail.com/seoljh0722@cau.ac.kr - GitHub: github.com/seoljh301
