Nvidia's Fugatto AI Model Generates Music and Sounds from Text and Audio

Nvidia has unveiled Fugatto (Foundational Generative Audio Transformer Opus 1), a cutting-edge AI model capable of generating a diverse range of sounds, music, and even voices based on user-provided text and audio prompts. This innovative technology opens up new creative possibilities for musicians, producers, and content creators.

This new AI model can produce jingles and song snippets from simple text prompts, add or remove instruments and vocals from existing tracks, modify the accent and emotion of a voice, and even create entirely new sounds. As Nvidia explained in their announcement, Fugatto aims to understand and generate sound in a way that mimics human perception.

“We wanted to create a model that understands and generates sound like humans do,” stated Rafael Valle, Manager of Applied Audio Research at Nvidia. “Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale.”

Nvidia AI researcher.

Fugatto offers significant potential for music production, allowing producers to quickly prototype song ideas in various styles and arrangements, add effects and layers to existing tracks, and adapt music and voiceovers for different campaigns. The model can even adjust video game music dynamically as the player progresses through a level.

One of Fugatto’s most remarkable features is its ability to generate entirely new sounds, like “barking trumpets” or “meowing saxophones.” This is achieved through a technique called ComposableART, which combines instructions learned during the model’s training.

“I wanted to let users combine attributes in a subjective or artistic way, selecting how much emphasis they put on each one,” explained Nvidia AI researcher Rohan Badlani. “In my tests, the results were often surprising and made me feel a little bit like an artist, even though I’m a computer scientist.”

Built with 2.5 billion parameters and trained on 32 H100 GPUs, Fugatto joins a growing field of generative audio AI. Stability AI launched a similar system in April, capable of generating three-minute tracks. Google’s V2A model can produce unlimited soundtracks for any video input.

YouTube recently introduced an AI music remixer that creates 30-second samples based on user prompts and input songs. Even OpenAI is exploring this area, releasing an AI tool that can clone a user’s voice and vocal patterns from just 15 seconds of sample audio.

Fugatto’s introduction highlights the rapid advancement of AI in audio generation and manipulation, promising new creative tools and possibilities for the future of music and sound design.

Most Colorful View of Sculptor Galaxy Unveiled by ESO’s VLT

Instant File Previews in Windows with PowerToys Peek

ChatGPT for Travel: Your AI-Powered Vacation Planner?

Most Colorful View of Sculptor Galaxy Unveiled by ESO’s VLT

Instant File Previews in Windows with PowerToys Peek

ChatGPT for Travel: Your AI-Powered Vacation Planner?

Nvidia’s Fugatto AI Model Generates Music and Sounds from Text and Audio

Leave a Reply Cancel reply

Recommended for You

Last Chance to Grab an RTX 40-Series GPU? Production Reportedly Ending

HP Patents a Novel Foldable Laptop Design

LG Launches Fastest OLED Gaming Monitor Yet: UltraGear GX7

Nvidia RTX 50-Series Launch: Good News for AMD and Intel?

DLSS 3 Frame Generation Arrives on Linux via Valve’s Proton Update

Apple Explores Smart Glasses Market with Internal Employee Focus Groups

Windows 11 24H2 File Explorer Bug Hinders Accessibility

Nvidia’s Blackwell GeForce GPUs: Are They Coming Sooner Than Expected?