Publications

Selected research papers from MILab

* Equal Contribution   ·   Co-Corresponding Author

2026

Seeing Through Touch teaser

Seeing Through Touch: Tactile-Driven Visual Localization of Material Regions

Seongyu Kim, Seungwoo Lee, Hyeonggon Ryu, Joon Son Chung, Arda Senocak

CVPR 2026 († Corresponding Author)

Synthetic SSL teaser

How Far Can We Go With Synthetic Data for Audio-Visual Sound Source Localization?

Arda Senocak*, Sooyoung Park*, Tae-Hyun Oh, Joon Son Chung.

CVPR 2026 (* Equal Contribution)

CASS teaser

Cinematic Audio Source Separation Using Visual Cues

Kang Zhang*, Suyeon Lee*, Arda Senocak†, Joon Son Chung

CVPR 2026 († Co-Corresponding Author)

2025

CLIP SSL teaser

Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization

Sooyoung Park*, Arda Senocak*, Joon Son Chung

IJCV 2025 (* Equal Contribution)

MG-Audio teaser

Model-Guided Dual-Role Alignment for High-Fidelity Open-Domain Video-to-Audio Generation

Kang Zhang*, Trung X. Pham*, Suyeon Lee, Axi Niu, Arda Senocak, Joon Son Chung

NeurIPS 2025

SSL Alignment teaser

Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment

Arda Senocak*, Hyeonggon Ryu*, Junsik Kim*, Tae-Hyun Oh, Hanspeter Pfister, Joon Son Chung

TPAMI 2025 (* Equal Contribution) Impact Factor: 20.8

Seeing Speech and Sound teaser

Seeing Speech and Sound: Distinguishing and Locating Audios in Visual Scenes

Hyeonggon Ryu*, Seongyu Kim*, Joon Son Chung, Arda Senocak

CVPR 2025 († Corresponding Author)

AVHBench teaser

AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models

Sung Bin Kim*, Hyun-Bin Oh*, JungMok Lee, Arda Senocak, Joon Son Chung, Tae-Hyun Oh

ICLR 2025

2024

Audio Mamba teaser

Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Mehmet Hamza Erol*, Arda Senocak*, Jiu Feng, Joon Son Chung

IEEE SPL 2024 (* Equal Contribution)

ElasticAST teaser

ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions

Jiu Feng, Mehmet Hamza Erol, Joon Son Chung, Arda Senocak

Interspeech 2024 († Corresponding Author)

MIM-VGS teaser

Speech Guided Masked Image Modeling for Visually Grounded Speech

Jong-Bin Woo, Hyeonggon Ryu, Arda Senocak, Joon Son Chung

ICASSP 2024

Coarse to Fine teaser

From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers

Jiu Feng*, Mehmet Hamza Erol*, Joon Son Chung, Arda Senocak†

ICASSP 2024 († Corresponding Author)

CLIP Sound Localization teaser

Can CLIP Help Sound Source Localization?

Sooyoung Park*, Arda Senocak*, Joon Son Chung

WACV 2024 (* Equal Contribution)

2023

Cross-Modal Alignment teaser

Sound Source Localization is All about Cross-Modal Alignment

Arda Senocak*, Hyeonggon Ryu*, Junsik Kim*, Tae-Hyun Oh, Hanspeter Pfister, Joon Son Chung

ICCV 2023 (* Equal Contribution)

FlexiAST teaser

FlexiAST: Flexibility is What AST Needs

Jiu Feng*, Mehmet Hamza Erol*, Joon Son Chung, Arda Senocak

Interspeech 2023 († Corresponding Author)

Sound2Scene teaser

Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment

Sung Bin Kim, Arda Senocak, Hyunwoo Ha, Andrew Owens, Tae-Hyun Oh

CVPR 2023

Hindi As a Second Language teaser

Hindi As a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples

Hyeonggon Ryu*, Arda Senocak*, In So Kweon, Joon Son Chung

ICASSP 2023 (* Equal Contribution)

MarginNCE teaser

MarginNCE: Robust Sound Localization with a Negative Margin

Sooyoung Park*, Arda Senocak*, Joon Son Chung

ICASSP 2023 (* Equal Contribution)

AV Fusion teaser

Event-Specific Audio-Visual Fusion Layers: A Simple and New Perspective on Video Understanding

Arda Senocak*, Junsik Kim*, Tae-Hyun Oh, Dingzeyu Li, In So Kweon

WACV 2023 (* Equal Contribution)

2022

Hard Positive teaser

Learning Sound Localization Better from Semantically Similar Samples

Arda Senocak*, Hyeonggon Ryu*, Junsik Kim*, In So Kweon

ICASSP 2022 Oral (* Equal Contribution)

Short Version at Sight and Sound Workshop @ CVPR 2022

Less Can Be More teaser

Less Can Be More: Sound Source Localization With a Classification Model

Arda Senocak*, Hyeonggon Ryu*, Junsik Kim*, In So Kweon

WACV 2022 (* Equal Contribution)

Received Honorable Mention, 28th HumanTech Paper Award, Samsung Electronics Co., Ltd. ($2000)

2021

TPAMI Sound Localization teaser

Learning to Localize Sound Source in Visual Scenes: Analysis and Applications

Arda Senocak, Tae-Hyun Oh, Junsik Kim, Ming-Hsuan Yang, In So Kweon

TPAMI 2021 Impact Factor: 24.314

2018

CVPR Sound Localization teaser

Learning to Localize Sound Source in Visual Scenes

Arda Senocak, Tae-Hyun Oh, Junsik Kim, Ming-Hsuan Yang, In So Kweon

CVPR 2018

Sports Player Identification teaser

Part-based Player Identification using Deep Convolutional Representation and Multi-scale Pooling

Arda Senocak, Tae-Hyun Oh, Junsik Kim, In So Kweon

CVSports Workshop @CVPR 2018