Facebook parent company Meta has announced two new tools for music and voice: MusicGen and Voicebox. Two new large-scale language models use AI to create lifelike music and voice from text prompts.
Voicebox is a generative AI model for text-to-speech that helps with audio editing, sampling, and styling. MusicGen is an open source text-to-music conversion AI comparable to Google’s MusicLM.
Meta claims Voicebox can create high-quality audio clips from scratch or edit pre-recorded samples. One article states that it can remove unwanted sounds like car horns and dogs barking while preserving the content and style of the audio. blog post on friday.
Also Read: Dogs Like ChatGPT Are Smarter Than AI, Says Meta AI Guru
Voicebox: ChatGPT for Audio
Voicebox works much like OpenAI’s ChatGPT and Dall-E. Create audio clips from text prompts instead of generating poems and images. The AI is trained on a large dataset of voice recordings and transcripts totaling over 50,000 hours.
This information includes public domain audiobooks in approximately six languages: English, French, Spanish, German, Polish, and Portuguese. according to For Meta researchers, multilingual capabilities give Voicebox a wide exposure to different speakers and accents, giving them a deeper understanding of the nuances of each language.
“Our results show that speech recognition models trained on synthesized speech generated by Voicebox perform nearly as well as models trained on real speech,” the researchers said. increase.
voice box match Mehta said it uses styles of audio that are as short as two seconds to generate text-to-speech. AI can also erase audio noises and misspelled words from the audio, allowing users to recreate recordings without having to rerecord everything.
For example, if your dog barks in the middle of a speech, you can tell Voicebox to cut out the bark and replay the lost audio, just like the Voice Eraser. In the future, Metaverse’s AI will be able to create virtual assistants and custom voices for her characters in the Metaverse, Meta said.
“This type of technology will in the future enable creators to easily edit audio tracks, enable visually impaired people to listen to messages from friends in their own voice, and speak any foreign language in their own voice. It could be used to enable you to speak,” Mehta said.
We introduce Voicebox, a new breakthrough speech generation system based on flow matching, a new technique proposed by Meta AI. Synthesize speech across six languages, perform noise reduction, edit content, transfer audio styles, and more.
Details and examples of this work
— Meta AI (@MetaAI) June 16, 2023
MusicGen rivals Google’s MusicLM
A week ago, meta too launched MusicGen is a large text-to-music language model that can generate original music, similar to Google’s MusicLM. The model is open source, so anyone can freely use it to create anything from rock to pop music.
MusicGen is a trance-based music generation model that can create short pieces of new music (about 12 seconds) based on text prompts. Users specify the genre of music they want to generate, the mood they want to create, and MusicGen creates a new song based on that input.
According to Meta’s Audiocraft research team, the AI works by predicting the next section of music, much like a language model predicts the next letter in a sentence.
— Gabriel Sineve (@syhw) June 9, 2023
and studyThe researchers compared MusicGen to other music generation software such as Google’s MusicLM, Riffusion, Mousai and Noise2Music. They found that Noise2Music was able to produce more “plausible” results as measured by both objective and subjective metrics.
However, MusicGen scored highest for accurate musical concept, audio and text alignment, and overall human audio quality and accuracy scores. You can try MusicGen online on Facebook hug face page.
A post about Meta unveiling MusicGen and Voicebox, AI tools for music and voice, first appeared on MetaNews.