What is Meta’s Voicebox? Here’s a look at new generative AI model for speech

'Voicebox' has the ability to create speech from text prompts, match audio from brief samples to mimic genuine voices, and remove background noise.

New Delhi,UPDATED: Jun 19, 2023 17:53 IST

'Voicebox', a generative AI model for voice, is introduced by Meta

Highlights

‘Voicebox’ tool can help with audio editing, sampling, & noise reduction
It helps the visually handicapped understand written information
The model is multilingual and can produce speech in six languages

On 16 June, 2023, Meta released 'Voicebox,' a cutting-edge generative AI model meant to disrupt the voice production industry. The model can produce speech in a variety of languages and operates 'like an eraser for audio editing' to clean up recordings that would have been destroyed by background noise or speaking blunders.

Text-in-speech

How is it different from other voice tools?

‘Voicebox’ is the sort of technology that enables individuals to talk in any foreign language, is a go-to tool for voice-over artists, and helps the visually handicapped understand written information.

In-context text-to-speech synthesis

Voicebox can generate text-to-speech from audio samples as short as two seconds in length by matching the audio style.

Diverse speech sampling

Voicebox was trained on a diverse set of data, resulting in speech that more closely shows how people naturally speak in the real world and across the six supported languages, including English, French, German, Spanish, Polish, and Portuguese.

Speech editing and noise reduction

Without having to re-record the full speech, Voicebox can correct misspelt words or restore a section of speech that was cut off by noise. For instance, you can clip a section of a conversation that was cut short by a dog barking and tell Voicebox to recreate that section again, acting as an eraser for audio editing.

Role of Voicebox in the metaverse

Meta has been committed to the preservation of languages all across the world. According to a blog post by Meta, "This type of technology could be used in the future to help creators easily edit audio tracks, enable people to speak any foreign language in their own voice, and allow visually impaired people to hear written messages from friends in their voices."

Additionally, Meta stated that its versatile generative AI models, like Voicebox, could provide a natural-sounding voice to virtual assistants or nonplayer characters (NPCs) in the metaverse, where people will come to work, play, and hang out.

‘Voicebox’ is currently under development and is not yet accessible to the public. Meta claims to be aware of the possible harm that this AI can cause and is working on an effective method of distinguishing between actual speech and audio created by Voicebox.