Mr. You Zhang | Engineering | Best Researcher Award
University of Rochester, United States
Dr. You (Neil) Zhang is a Ph.D. candidate in Electrical and Computer Engineering at the University of Rochester, specializing in machine learning for speech, acoustics, and audio signal processing. His research focuses on spatial audio (HRTF personalization), speech anti-spoofing, singing voice deepfake detection, and audio-visual learning. He has held research roles at Dolby, Meta, Microsoft, Tencent, and Bytedance, contributing significantly to areas like perceptual HRTF modeling and audio-visual deepfake detection.
Profile:
๐ Education:
-
Ph.D. in Electrical & Computer Engineering (Expected 2025)
University of Rochester -
M.S., University of Rochester
-
B.Eng., University of Electronic Science & Technology of China
-
Exchange Program, UC Berkeley
๐ง Research Interests:
-
Spatial Audio & HRTF Personalization ๐ง
-
Speech Deepfake Detection & Audio Security ๐
-
Multimodal Learning: Audio-Visual & Emotional Speech Synthesis ๐ฅ๐ฃ๏ธ
๐ Honors & Fellowships:
-
IEEE SPS Scholarship (2024)
-
NIJ Graduate Research Fellowship (2023)
-
ICASSP Rising Star in Signal Processing (2023)
-
Open Scholarship Award @ UR (2025)
๐งช Research & Industry Experience:
-
Dolby Labs ๐ถ โ Sr. Researcher, Multimodal Spatial Audio
-
Meta Reality Labs ๐ง โ HRTF Perceptual Learning
-
Microsoft, Tencent, ByteDance, IngenID ๐ผ โ AI R&D Internships
-
Audio Information Research Lab, UR ๐๏ธ โ Deepfake Detection, AV Speech, HRTF Neural Fields
๐ Selected Publications:
-
IEEE T-MM, SPL, ICASSP, Interspeech, NAACL
-
Co-organizer of SVDD Challenge at SLT 2024 & MIREX 2024
-
Contributor to Handbook of Biometric Anti-spoofing (Springer)
๐ค Talks & Tutorials:
-
Invited speaker at CMU, NII Japan, ISCA SPSC
-
Tutorials @ ASA, ICME, AES (Topics: HRTF, Deepfakes, ML for Acoustics)
๐ Teaching & Mentorship:
-
TA for Machine Learning, Audio Signal Processing, Random Processes
-
Mentored 10+ undergrad and graduate students in UR, Tsinghua, UESTC
๐ผ Professional Service:
-
Reviewer for IEEE TASLP, TPAMI, ICASSP, Interspeech, CVPR Workshops
-
Member: IEEE, ASA, ACM, AES
-
DEI Committee @ UR ECE | Organizer of AR/VR Events
๐ป Skills:
-
Programming: Python, MATLAB, C, Java
-
Tools: Git, Linux, PyTorch, Slurm
-
Languages: English ๐บ๐ธ, Mandarin ๐จ๐ณ
๐ Hobbies & More:
-
Half-Marathon Finisher ๐
-
Loves stand-up paddleboarding, travel, badminton ๐โ๏ธ๐ธ
Google Scholar Citation Metrics:
-
Citations: 751 (All time) | 751 (Since 2020)
-
h-index: 12 (All time) | 12 (Since 2020)
-
i10-index: 14 (All time) | 14 (Since 2020)
Publication Top Notes:
-
One-class Learning Towards Synthetic Voice Spoofing Detection
Y. Zhang, F. Jiang, Z. Duan
IEEE Signal Processing Letters, vol. 28, pp. 937โ941, 2021. -
Speech Driven Talking Face Generation from a Single Image and an Emotion Condition
S.E. Eskimez, Y. Zhang, Z. Duan
IEEE Transactions on Multimedia, vol. 24, pp. 3480โ3490, 2021. -
UR Channel-Robust Synthetic Speech Detection System for ASVspoof 2021
X. Chen, Y. Zhang*, G. Zhu*, Z. Duan
ASVspoof 2021 Workshop, 2021. -
SingFake: Singing Voice Deepfake Detection
Y. Zang, Y. Zhang*, M. Heydari, Z. Duan
IEEE ICASSP, 2024. -
An Empirical Study on Channel Effects for Synthetic Voice Spoofing Countermeasure Systems
Y. Zhang, G. Zhu, F. Jiang, Z. Duan
Interspeech, pp. 4309โ4313, 2021. -
SAMO: Speaker Attractor Multi-Center One-Class Learning for Voice Anti-Spoofing
S. Ding, Y. Zhang, Z. Duan
IEEE ICASSP, 2023. -
A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification
Y. Zhang, G. Zhu, Z. Duan
Odyssey: The Speaker and Language Recognition Workshop, pp. 77โ84, 2022. -
Global HRTF Personalization Using Anthropometric Measures
Y. Wang, Y. Zhang, Z. Duan, M. Bocko
Audio Engineering Society (AES) 150th Convention, 2021. -
Rethinking Audio-Visual Synchronization for Active Speaker Detection
A. Wuerkaixi, Y. Zhang, Z. Duan, C. Zhang
IEEE MLSP, 2022. -
CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Y. Zang, J. Shi, Y. Zhang, et al.
Interspeech, pp. 4783โ4787, 2024. -
VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music
J. Shi, H. Shim, J. Tian, Y. Zhang, et al.
NAACL (Demo Track), 2025. -
HRTF Field: Unifying Measured HRTF Magnitude Representation with Neural Fields
Y. Zhang, Y. Wang, Z. Duan
IEEE ICASSP, 2023. -
SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge
Y. Zhang, Y. Zang, J. Shi, R. Yamamoto, T. Toda, Z. Duan
IEEE SLT, pp. 782โ787, 2024. -
DyViSE: Dynamic Vision-Guided Speaker Embedding for Audio-Visual Speaker Diarization
A. Wuerkaixi, K. Yan, Y. Zhang, Z. Duan, C. Zhang
IEEE MMSP, 2022. -
Predicting Global Head-Related Transfer Functions from Scanned Head Geometry Using Deep Learning and Compact Representations
Y. Wang, Y. Zhang, Z. Duan, M. Bocko
arXiv preprint, arXiv:2207.14352, 2022. -
Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech
E. Zhou, Y. Zhang, Z. Duan
IEEE ICASSP, 2024. -
SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan
Y. Zhang, Y. Zang, J. Shi, et al.
arXiv preprint, arXiv:2405.05244, 2024. -
ASVspoof 5: Design, Collection and Validation of Resources for Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech
X. Wang, H. Delgado, Y. Zhang, et al.
Computer Speech & Language, 2025. -
Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions
K. Zhou, Y. Zhang, S. Zhao, et al.
arXiv preprint, arXiv:2409.16681, 2024. -
Mitigating Cross-Database Differences for Learning Unified HRTF Representation
Y. Wen, Y. Zhang, Z. Duan
IEEE WASPAA, 2023.