Research Fellow in Generative Audio AI

University of Surrey


Contract Type
Full Time

£36,024 to £38,205 per annum

Closing Date
21 April 2024

More info

The University of Surrey is a global community of ideas and people, dedicated to life-changing education and research.

We are ambitious and have a bold vision of what we want to achieve – shaping ourselves into one of the best universities in the world, which we are achieving through the talents and endeavour of every employee.

Our culture empowers people to achieve this aim and to collectively, and individually, make a real difference.

 The role

Applications are invited for a Research Fellow (RF) position for 12 months within the Centre for Vision Speech and Signal Processing (CVSSP), University of Surrey, UK, to work in the area of generative AI for audio generation with text and video prompts.

The post is funded by a leading generative AI startup. The focus will be to develop generative machine learning models and signal processing algorithms for sound generation, given prompts from text and/or video.  This work is built on the recent contributions of CVSSP in the generative AI models for audio generation, such as AudioLDM and Re-AudioLDM, with a focus on scaling up the models with additional datasets and extending the models to include more modalities such as video.

The post-holder will be based in CVSSP, and work under the direction of the Principal Investigator Prof Wenwu Wang, with co-supervision by Prof Mark Plumbley, and in collaboration with the industrial partner.

About you

The post-holder is expected to have a PhD degree (or equivalent) in the area of machine learning, generative AI, acoustic signal processing, cross-modal processing among audio, text and video, or a related area in electronic engineering, applied mathematics, computer science, and statistics. The post-holder is expected to have strong analytical skills and programming skills in Python, Matlab or C/C++.  Preference will be given to those who have experience on generative AI models, audio generation, cross modal translations (such as, text to audio, video to audio), but candidates who have experience in machine learning and audio-visual processing are welcome to apply.

How to apply

Please submit a CV and cover letter with your application, on the University website. For informal inquiries, please contact Prof Wenwu Wang (Email:; Web:

Please note, interviews scheduled to take place week commencing 29th April.

CVSSP is an International Centre of Excellence for research in Audio-Visual Machine Perception and AI, with over 180 researchers. The Centre has state-of-the-art audio and video capture and analysis facilities supporting research in real-time video and audio processing and visualisation. CVSSP has a compute facility with 200 GPUs and >2PB of high-speed secure storage.

Posted on 27th March 2024 in Job Opportunities in Acoustics, Early Careers Group