New detection method sniffs out audio deepfakes

Researchers have developed a technique that distinguishes between a real human voice and an imitation.

Last year, a series of videos surfaced of a simulated Tom Cruise that took social media by storm.

They were deepfakes – a form of digital fabrication powered by artificial intelligence, underpinned by ‘deep learning’ algorithms that learn the movements or sounds of two different recordings and combine them to produce realistic-looking fake media.

There are two kinds of deepfakes: Video deepfakes, which reproduce the look and voice of an actual person, and audio deepfakes, which imitate a person’s voice. While deepfake detection software has received a lot of attention, they have mainly focused on analysing image files.

Now, researchers have developed a deepfake audio detection method designed to spot increasingly realistic audio deepfakes.

To do so, Joel Frank and Lea Schonherr, from the Horst Gortz Institute for IT Security at Ruhr-Universitat Bochum, amassed around 118,000 samples of synthesised audio voice recordings that amounted to almost 196 hours of fake voice recordings in both English and Japanese.

Loading...

“Such a dataset for audio deepfakes did not exist before,” explained Schonherr in a press release announcing the new method. “But in order to improve the methods for detecting fake audio files, you need all this material.”

To ensure the dataset was diverse, the team used six different AI algorithms when generating the audio snippets. Each artificial audio file was then compared with recordings of real speech after researchers plotted their frequency distribution as spectrograms and patterns began to emerge.

This comparison “revealed subtle differences in the high frequencies between real and fake files,” the two researchers highlighted, noting that the difference was significant enough to allow for a determination between a real and fake file.

Other

These spectrograms show the frequency distribution over time of a real (top) and a fake audio file (bottom). The subtle differences in the higher frequencies are marked with red circles.

Based on those findings, which were presented at last month’s Conference on Neural Information Processing Systems, Frank and Schonherr claim they have developed a set of algorithms which harness their technique that distinguishes between a real human voice and an imitation.

Their novel software is only the beginning, as they state that “these algorithms are designed as a starting point for other researchers to develop novel detection methods.”

With the process becoming cheaper and widely accessible, the amount of deepfake content has been growing at an alarming rate.

While many have sought to commercialise the technology by licensing it out to gaming and social media firms, the potential misuses for deepfakes remain worrying.

While still largely in its experimental stage, opportunistic criminals have already begun to deploy audio deepfakes to run online and telephonic scams.

One notable case in 2019 saw criminals use a generated voice recording to impersonate the CEO of a UK-based energy company to steal $243,000.

Route 6