Dark Mode Light Mode

Nvidia’s Fugatto AI Model Generates Music and Sounds from Text and Audio

Nvidia's Fugatto AI Model Generates Music and Sounds from Text and Audio Nvidia's Fugatto AI Model Generates Music and Sounds from Text and Audio

Nvidia has unveiled Fugatto (Foundational Generative Audio Transformer Opus 1), a cutting-edge AI model capable of generating a diverse range of sounds, music, and even voices based on user-provided text and audio prompts. This innovative technology opens up new creative possibilities for musicians, producers, and content creators.

This new AI model can produce jingles and song snippets from simple text prompts, add or remove instruments and vocals from existing tracks, modify the accent and emotion of a voice, and even create entirely new sounds. As Nvidia explained in their announcement, Fugatto aims to understand and generate sound in a way that mimics human perception.

See also  Neuralink's Brain Implant Aims to Control Robotic Arms with Thoughts

“We wanted to create a model that understands and generates sound like humans do,” stated Rafael Valle, Manager of Applied Audio Research at Nvidia. “Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale.”

Nvidia AI researcher.Nvidia AI researcher.

Fugatto offers significant potential for music production, allowing producers to quickly prototype song ideas in various styles and arrangements, add effects and layers to existing tracks, and adapt music and voiceovers for different campaigns. The model can even adjust video game music dynamically as the player progresses through a level.

One of Fugatto’s most remarkable features is its ability to generate entirely new sounds, like “barking trumpets” or “meowing saxophones.” This is achieved through a technique called ComposableART, which combines instructions learned during the model’s training.

See also  Apple Rumored to Launch 90Hz Displays for iMac, Studio Display, and iPad Air

“I wanted to let users combine attributes in a subjective or artistic way, selecting how much emphasis they put on each one,” explained Nvidia AI researcher Rohan Badlani. “In my tests, the results were often surprising and made me feel a little bit like an artist, even though I’m a computer scientist.”

Built with 2.5 billion parameters and trained on 32 H100 GPUs, Fugatto joins a growing field of generative audio AI. Stability AI launched a similar system in April, capable of generating three-minute tracks. Google’s V2A model can produce unlimited soundtracks for any video input.

YouTube recently introduced an AI music remixer that creates 30-second samples based on user prompts and input songs. Even OpenAI is exploring this area, releasing an AI tool that can clone a user’s voice and vocal patterns from just 15 seconds of sample audio.

See also  Nvidia's RTX 5070 Ti: Leaked Specs Hint at RTX 4080 Performance

Fugatto’s introduction highlights the rapid advancement of AI in audio generation and manipulation, promising new creative tools and possibilities for the future of music and sound design.

Add a comment Add a comment

Leave a Reply

Your email address will not be published. Required fields are marked *