Meta's new 'Voicebox' AI is a text-to-speech tool that learns like ChatGPT

Oct 25, 2024

4.2

★

276 User Rating

Meta AI recently unveiled a "groundbreaking" text-to-speech (TTS) generator that it claims produces results 20 times faster than comparable state-of-the-art AI models.

The new system, called Voicebox, eschews the traditional TTS architecture in favor of a model more akin to OpenAI's ChatGPT or Google's Bard. The main difference between Voicebox and similar TTS models (such as ElevenLabs Prime Voice AI) is that Meta's product can be generalized through contextual learning.

Much like ChatGPT or other translator models, Voicebox uses a large-scale training dataset. Previous efforts using large amounts of audio data resulted in severe degradation of audio output. For this reason, most TTS systems use small, highly curated, labeled datasets.

Meta overcomes this limitation with a novel training scheme that forgoes labeling and curation for architectures that can "populate" audio information. As Meta AI put it in a June 16 blog post, Voicebox is "the first model that can generalize to a speech generation task that it wasn't trained to do with state-of-the-art performance." This allows Voicebox to translate text to speech, remove unwanted noise by synthesizing replacement speech, and even apply the speaker's voice to a different language output.

According to an accompanying research paper published by Meta, its pretrained Voicebox system can do all this using only the desired output text and a three-second audio clip. The robust speech generation comes at a particularly sensitive time, as social media companies continue to struggle With moderation, and in the United States, the looming presidential election threatens to test the limits of online misinformation detection again. For example, former US President Donald Trump is currently facing allegations that he mishandled classified government material after leaving office. Among the alleged evidence cited in the case against him were audio recordings in which he allegedly admitted potential wrongdoing.

While there is no indication that the former president intends to deny what is described in the audio files, his case shows that data integrity is at the heart of the US legal system, and by extension, its democracy.

Voicebox isn't the first tool of its kind, but it appears to be one of the most powerful. So Meta developed a tool to determine if speech was generated by it, which the company claims can "simply detect" the difference between real and Fake Audio. According to the blog post: "Like Other Powerful New Ai Innovations, We Recognize that this technology presents the portal for misuse and University tional han. In our pain Voicebox can be differentiated to mitigate these possible future risks."

In the world of cryptocurrencies, artificial intelligence has become as integral to the day-to-day operations of most businesses as the internet or electricity. The largest exchanges rely on AI chatbots for customer interactions and sentiment analysis, and trading bots are already commonplace.