Google's New AI Can Generate Any Type Of Music From Text

In case you haven't heard by now, we're barreling toward a glorious, Skynet-esque reality where AI feels dangerously close to sentience. We're not there yet, but things are changing in a hurry.

While AI-powered tools and tech have been around for years, its advancement is reaching levels where it's starting to threaten the human reliance of many key industries, from fast food restaurants to software engineering. We've all heard about the universal wonders of ChatGPT of late, and it hasn't even yet reached its peak form.

Now, Google is showing us just how smart AI can be when it comes to creating music. A team of researchers at the tech giant have revealed MusicLM, an AI tool that can create music based on instructional text. Again, this is something others have tried before, but MusicLM seems to represent a quantum leap in the capabilities these sorts of systems can have.

How MusicLM works

MusicLM is a machine learning model that can match descriptive text to thousands of sounds. Users can conscript the AI to create songs of varying lengths with parameters that describe the type of tempo, rhythm, and cultural influences they want, whether that's a game-ready 8-bit soundtrack or a thumping reggaeton anthem. You can even instruct it to add lyrics to the beat, though judging by the samples Google released, it seems to be generating little more than gibberish (albeit very on-beat gibberish).

If that wasn't impressive enough, how about the fact that users can feed it an audio sample of a whistle or hum to emulate the exact melody they want? The AI can also generate all of this sequentially, giving users the ability to craft full songs that can ramp up or down in tempo with different sections. It takes all of these variables and seamlessly produces a full 24KHz audio composition, ranging anywhere from 15 seconds to 5 minutes.

A whitepaper detailing the research on it says that MusicLM was built atop AudioLM, which can hear a piece of music and attempt to emulate it. However, the project members suggest that implementing a text-oriented solution is a much more intense undertaking, as it's significantly harder to accurately train the model on the complexities of sounds through everyday human definitions. Plus, it hasn't had the same enormous library of samples to work from as other image-based machine learning algorithms have, though it appears to be closing the gap.

What's next for AI?

While we can't play with MusicLM ourselves -– it's not been released it to the public –- from the vast library of samples sampled, there's still work to be done. While the characteristics of the songs do seem to accurately match the descriptions being fed to the system, the resulting tracks aren't always coherent. That said, even in its current state, it could be a great tool for creating, say, royalty-free tracks for YouTube channels, or at least give users a solid starting point from which to draw inspiration.

But that's just wishful thinking -– Google hasn't expressed any intention to open it up to public use. However, the company did make public a dataset of 5.5-thousand music-text pairings for interested parties to toy with. It was created with the help of musical experts, and while that alone won't be enough for any experimental or commercial implementations of the model, it's a great starting point for similar projects.

It's widely believed by experts that artificial intelligence will explode in overall fidelity and utility in 2023. The public popularity has already reached a fever pitch, with tools like ChatGPT and Jasper.ai starting to generate mainstream buzz. We can't predict a timeline for reaching the level of maturation needed before AI takes over nearly every part of human society, but it's clear by now that its progression isn't slowing down.