IBM Security researchers have uncovered a surprising new threat in artificial intelligence: the ability to manipulate live conversations through a technique known as “audiojacking.”
Using generative AI and deepfake audio technology, this technique allows spoken words to be intercepted and altered in real time, posing unprecedented risks to digital communications.
Also read: AI deepfake detection advances against growing deception challenges
Exposing the threat of audio jacks
Audio jacking works by processing live audio from two-way communications, such as phone calls, and searching for specific keywords or phrases. When these triggers are detected, the AI intervenes and replaces the real audio with a manipulated deepfake version before reaching its destination.wonderful figure The technology, provided by researchers at IBM, changed bank account information spoken during a conversation without detection.
Audio Jacking: Using Generative AI to Distort Live Audio Transactions: The rise of generative AI, such as text-to-image conversion, text-to-speech conversion, and large-scale language models (LLM), is making our work and My personal life has changed a lot. While these advances… https://t.co/GR6AyVACZX pic.twitter.com/Ucw7tNeQCE
— Shah Sheikh (@shah_sheikh) February 1, 2024
The ease with which a proof of concept for this attack could be devised is baffling. The researchers noted that the most difficult part was not creating the AI, but the technical part of recording and processing the live audio. This ease of development represents a significant departure from traditional expectations, where such efforts require considerable expertise across several computer science disciplines.
“Building this PoC was surprisingly and terrifyingly easy. We spent most of our time figuring out how to capture audio from the microphone and feed that audio to the generative AI.”
The role of generative AI is important in this scheme. The technology uses just three seconds of a person's voice to create convincing clones that can create authentic deepfakes on demand. This functionality, available via API, represents a disturbing trend in the process of democratizing advanced operational tools.
“Now, by simply copying a person's voice for three seconds, you can generate authentic fake voices using text-to-speech APIs.”
Impact and potential for misuse
However, the impact of audiojacking goes beyond financial fraud. Such technology can perform real-time censorship and alter live broadcasts such as news and political speeches without detection. These features undermine information integrity and have serious implications for democracy and public trust.
The lower barrier to audiojacking attacks has significantly reduced sophisticated social engineering and phishing. This mitigation raises the issue of the proliferation of such attacks, challenging current security measures and requiring new defense mechanisms.
“The maturity of this PoC would suggest significant risks to consumers above all else…The more sophisticated this attack becomes, the wider the net of victims is likely to become.”
The phenomenon of audio jacking highlights a broader issue in AI development: the dual nature of generative technologies. On the one hand, they are a source of endless possibilities for innovation and creativity. On the other hand, the possibility of their being exploited should not be ignored. The incident raises crucial questions about how society can harness the benefits of AI while protecting against its darker applications.
Navigating the future of AI security
As the digital threat landscape evolves, IBM Security's identification of voice jacking is an important warning to remain vigilant and modernize your cybersecurity. Advances in countermeasures such as innovative detection algorithms and stronger encryption techniques are essential to combating these advanced threats.
Additionally, the disclosure makes clear that ethical issues are integral to AI research and development. To address the dangers of such a powerful technology, it is essential to set rules and benchmarks for responsible use of AI.