Closing Date
8 April 2021

Applications are invited for a Research Fellow (RF) position for 22 months within the Centre for Vision Speech and Signal Processing (CVSSP), University of Surrey, UK, to work on a project titled “Automated Captioning of Image and Audio for Visually and Hearing Impaired”, which is a collaborative project between the University of Surrey and the Izmir Katip Celebi University (IKCU), Turkey, with project partners from charities and industrial sectors working with the hearing and visually impaired. This project aims to address fundamental challenges in audio and image captioning, develop new algorithms to improve performance of audio and image captioning algorithms, and application tools that could be used by the hearing and visually impaired to access audio and image content.

The work at Surrey will focus on new methods and algorithms of automated audio captioning and natural language description of audio. This work is built on the significant contributions of CVSSP in the area of acoustic scene analysis, audio event detection, environmental sound recognition, and audio tagging, together with preliminary results on audio captioning. This new project offers an opportunity to take this work to the next stages, and demonstrate the benefit of such technologies for the hearing and visually impaired. A smartphone based prototype will be developed for audio and visual captioning jointly by Surrey and IKCU. New data will also be gathered, including audio-visual data for captioning, and user feedback for the prototype system.

The postholder will be responsible for investigating and developing audio signal processing, machine learning algorithms for natural language description of sound, and implementing software for prototyping the concept and algorithms. The postholder should have a doctoral level (or equivalent) research and development experience in electronic engineering, applied mathematics, computer science, artificial intelligence, machine learning, natural language processing, or related subjects. The postholder should ideally have experience in one of the following areas: audio captioning, machine description of audio, audio classification, audio tagging, image captioning, video captioning, translations between audio/image and texts, and/or translation between audio and video.

The post-holder will be based in CVSSP, and work under the direction of the Principal Investigator Prof Wenwu Wang, with co-supervision by Prof Sabine Braun, Director of the Centre for Translation Studies, at University of Surrey, and in collaboration with Dr Volkan Kilic, from the IKCU, Turkey.

CVSSP is an International Centre of Excellence for research in Audio-Visual Machine Perception, with over 150 researchers, a grant portfolio of £24M (£17.5M EPSRC) from EPSRC, EU, InnovateUK, charity and industry, and a turnover of £7M/annum. The Centre has state-of-the-art acoustic capture and analysis facilities and a Visual Media Lab with video and audio capture facilities supporting research in real-time video and audio processing and visualisation. CVSSP has a compute facility with 120 GPUs and >1PB of high-speed secure storage.

For informal inquiries, please contact Prof Wenwu Wang (Email:; Web:

More details and application link are here.

Posted on 18th March 2021 in Job Opportunities in Acoustics