Can you hear me now? AI Kostik to combat loud audio with generative AI

Loud recordings of interviews and speeches are the bane of a sound engineer’s existence. But a German startup is hoping to unravel this problem with a novel technical approach that uses generative AI to enhance the clarity of voices in videos.

Today, AI acoustics got here out of stealth with funding of 1.9 million euros. According to co-founder and CEO Fabian Seipel, AI-coustics’ technology goes beyond standard noise cancellation and works on all devices and speakers.

“Our primary mission is to make every digital interaction, be it a conference call, a consumer device or an off-the-cuff social media video, as clear as a broadcast from an expert studio,” Seipel said in an interview with TechCrunch.

Seipel, a trained audio engineer, founded AI-coustics in 2021 along with Corvin Jaedicke, lecturer in machine learning on the Technical University of Berlin. Seipel and Jaedicke met while studying audio engineering at TU Berlin, where they often encountered poor audio quality in the web courses and tutorials they were required to finish.

“Our personal mission is to beat the pervasive problem of poor audio quality in digital communications,” said Seipel. “While my hearing is barely impaired resulting from music production in my early twenties, I even have at all times struggled with online content and lectures, which led us to initially have a look at the difficulty of voice quality and speech intelligibility.”

The marketplace for AI-powered noise cancellation and speech enhancement software is already very robust. AI-coustics’ competitors include Insoundz, which uses generative AI to reinforce streamed and pre-recorded voice clips, and Veed.io, a video editing suite with tools to remove background noise from clips.

However, Seipel says AI-coustics takes a novel approach to developing the AI mechanisms that do the actual noise reduction work.

The startup uses a model trained on voice samples recorded within the startup’s studio in Berlin, AI-coustics’ hometown. People are paid to take samples – Seipel would not say how much – that are then added to a knowledge set to coach AI-coustics’ noise-reducing model.

“We have developed a novel approach to deal with audio artifacts and issues – e.g. “Such as noise, reverb, compression, band-limited microphones, distortion, clipping, etc. – through the training process,” said Seipel.

I bet some could have issues with AI-coustics’ unique developer compensation system, because the model the startup is constructing could prove to be quite lucrative in the long term. (There is a healthy debate about whether creators of coaching data for AI models deserve residuals for his or her contributions.) But perhaps the larger and more immediate concern is bias.

It is well-known that speech recognition algorithms can develop biases – biases that ultimately harm users. A study A study published within the Proceedings of the National Academy of Sciences showed that speech recognition devices from leading firms are twice as more likely to incorrectly transcribe audio from Black speakers than from white speakers.

To counteract this, AI-coustics is specializing in recruiting “diverse” contributors to voice samples, in accordance with Seipel. He added: “Size and variety are key to eliminating bias and making the technology work across languages, speaker identities, ages, accents and genders.”

It wasn’t probably the most scientific test, but I uploaded three video clips – one Interview with an 18th century farmerA Car driving demo and a Protest against the Israeli-Palestinian conflict – to the AI-coustics platform to see how well it really works with each platform. AI-coustics has actually delivered on its promise to enhance clarity; In my opinion, the processed clips had far less background noise drowning out the speakers.

Here’s the 18th century farmer clip before it:

And then:

Seipel expects AI-coustics’ technology for use for each real-time and recorded speech enhancement, and should even be embedded into devices corresponding to soundbars, smartphones and headphones to robotically improve speech intelligibility. At the moment, AI-coustics offers an online app and API for post-production of audio and video recordings, in addition to an SDK that integrates the AI-coustics platform into existing workflows, apps and hardware.

Seipel says AI-coustics — which makes money through a combination of subscriptions, on-demand pricing and licensing — currently has five enterprise customers and 20,000 users (though not all paying). The roadmap for the subsequent few months includes expanding the corporate’s four-person team and improving the underlying language improvement model.

“Prior to our initial investment, AI-coustics ran a comparatively lean operation with a low burn rate to weather the difficulties of the VC investment market,” Seipel said. “AI-coustics now has an intensive network of investors and mentors in Germany and Great Britain for advice. A powerful technology base and the flexibility to focus on different markets using the identical database and core technology gives the corporate flexibility and the chance for smaller pivots.”

Asked if audio mastering technologies like AI-Coustics could steal jobs as some experts fearSeipel noted the potential for AI acoustics to hurry up time-consuming tasks which might be currently the responsibility of human audio engineers.

“A content creation studio or broadcast manager can save money and time by automating parts of the audio production process with AI-Coustics while maintaining the very best voice quality,” he said. “Voice quality and intelligibility proceed to be a vexing problem in almost every consumer or skilled device, in addition to within the production or consumption of content. Any application that involves recording, processing or transmitting voice can potentially profit from our technology.”

The financing got here in the shape of an equity and debt capital tranche from Connect Ventures, Inovia Capital, FOV Ventures and Ableton CFO Jan Bohl.

This article was originally published at techcrunch.com