About

I’m a researcher focused on how paraverbal acoustic variation—such as emotion, prosody, non-verbal vocalization—is encoded inside audio/speech foundation models and how those representations influence semantic prediction and reasoning in Speech-LLMs.

I focus on quantifying representation sensitivity, invariance, and leakage. My recent work includes (1) benchmarking acoustic representation leakage in Speech-LLMs (VoiceBBQ), and (2) developing disentangled and robust speech modeling components as tools for controlled analysis (FairSLM, VorTEX).

Research Interests

  • Representation Analysis in Speech Foundation Models: Probing what is encoded across layers (emotion, prosody, speaker identity) and how it surfaces in downstream decisions.
  • Controlled Evaluation for Speech Systems: Designing attribute-controlled evaluation to measure invariance, sensitivity, and leakage under paraverbal variation.
  • Robust Speech Systems as Representation Tools: Using TSE and other robust front-ends to analyze how overlap/noise perturb internal representations and semantic stability.
  • Disentangled Audio Representation: Learning attribute-invariant latent spaces for controllability and generalization under stylistic changes.

Education

  • B.S. in Computer Science and Engineering, Chung-Ang University (Mar 2019 – Aug 2025)
  • VoiceBBQ (co-author, EMNLP 2025 Main)
    • Benchmark for diagnosing content vs. acoustic social bias (and representation leakage) in Speech-LLMs via attribute-controlled minimal pairs.
  • FairSLM: Mitigating Acoustic Social Bias via Disentangled Speech Representations (first author, Manuscript under review)
    • Disentangles paraverbal acoustic attributes from semantic content to improve robustness to stylistic variation and associated social bias.
  • VorTEX: Various Overlap Ratio for Target Speech Extraction (co-first author, Manuscript under review)
    • Robust TSE architecture designed to generalize across diverse overlap ratios and low-SNR conditions in multi-speaker environments.
  • Acoustic-based Gender Differentiation in Speech-aware Language Models (co-author, Submitted to TACL 2026)
    • Systematic analysis of how identical prompts can elicit different responses depending on speaker gender cues.

Selected Projects

  • Korean Acting-Tone Emotional/Paraverbal Speech Corpus (Assistant)
    • Built an acted corpus with controlled linguistic content to isolate paraverbal variation; expanded with TTS/VC augmentation and curated contrastive pairs for probing/evaluation.
  • AudioPoli: Auditory Emergency-Scene Classification (Google Solution Challenge Global Top 100)
    • Developed an environmental sound classification system for emergency situation detection.
  • AULO: Short-length Sound Effect Retrieval
    • Implemented a sound effect retrieval system using VAE-based acoustic encoders.

Awards & Honors

  • Solution Challenge – Global Top 100, Google Developers Groups (2024)
  • Full-tuition Merit-based Scholarship, Chung-Ang University (2019 – 2025)

Skills

  • Languages: Python, C/C++, Bash, CUDA (basic), SQL
  • Frameworks: PyTorch, TensorFlow, Keras, PyTorch Lightning
  • Toolkits: HuggingFace, Librosa, Amphion, OpenAI Whisper, ESPnet
  • Infra: Git, Docker, GCP, Linux, LaTeX

Contact