https://chuanenlin.com/soundify/ cmu logo hcii logo runway logo neurips 2021 logo Soundify: Matching Sound Effects to Video David Chuan-En Lin^1, Anastasis Germanidis^2, Cristobal Valenzuela^2, Yining Shi^2, Nikolas Martelaro^1 ^1Carnegie Mellon University, ^2Runway Paper Citation (BibTeX) teaser image Soundify matches sound effects (bold) and ambients (italics) by detecting "sound emitters". Abstract In the art of video editing, sound is really half the story. A skilled video editor overlays sounds, such as effects and ambients, over footage to add character to an object or immerse the viewer within a space. However, through formative interviews with professional video editors, we found that this process can be extremely tedious and time-consuming. We introduce Soundify, a system that matches sound effects to video. By leveraging labeled, studio-quality sound effects libraries and extending CLIP, a neural network with impressive zero-shot image classification capabilities, into a "zero-shot detector", we are able to produce high-quality results without resource-intensive correspondence learning or audio generation. We encourage you to have a look at, or better yet, have a listen to the results at https://chuanenlin.com/soundify. Example Visualization Example Results (c) 2021 David Chuan-En Lin. Website based on AcademicPages.