AWS DeepComposer – Compose Music with Generative Machine Learning Models
Today, we’re extremely happy to announce AWS DeepComposer, the world’s first musical keyboard combined with a generative AI service. Yes, you read that right.
Machine learning (ML) requires quite a bit of math, computer science, code, and infrastructure. These topics are exceedingly important but to a lot of aspiring ML developers, they look overwhelming and sometimes, dare I say it, boring.
To help everyone learn about practical ML and have fun doing it, we introduced several ML-powered devices. At AWS re:Invent 2017, we introduced AWS DeepLens, the world’s first deep learning-enabled camera, to help developers learn about ML for computer vision. Last year, we launched AWS DeepRacer, a fully autonomous 1/18th scale race car driven by reinforcement learning. This year, we’re raising the bar (pardon the pun).
Introducing AWS DeepComposer
AWS DeepComposer is a 32-key, 2-octave keyboard designed for developers to get hands on with Generative AI, with either pretrained models or your own.
You can request to get emailed when the device becomes available, or you can use a virtual keyboard in the AWS console.
Here’s the high-level view:
- Log into the DeepComposer console,
- Record a short musical tune, or use a prerecorded one,
- Select a generative model for your favorite genre, either pretrained or your own,
- Use this model to generate a new polyphonic composition,
- Play the composition in the console,
- Export the composition or share it on SoundCloud.
Let me show you how to quickly generate your first composition with a pretrained model. Then, I’ll discuss training your own model, and I’ll close the post with a primer on the underlying technology powering DeepComposer: Generative Adversarial Networks (GAN).
Using a Pretrained Model
Opening the console, I go to the Music Studio, where I can either select a prerecorded tune, or record one myself.
I go with the former, selecting Beethoven’s “Ode to Joy”.
I also select the pretrained model I want to use: classical, jazz, rock, or pop. These models have been trained on large music data sets for their respective genres, and I can use them directly. In the absence of ‘metal’ (watch out for that feature request, team), I pick ‘rock’ and generate the composition.
A few seconds later, I see the additional accompaniments generated by the model. I assign them different instruments: drums, overdriven guitar, electric guitar (clean), and electric bass (finger).
Here’s the result. What do you think?
Finally, I can export the composition to a MIDI or MP3 file, and share it on my SoundCloud account. Fame awaits!
Training Your Own Model
I can also train my own model on a data set for my favorite genre. I need to select:
- Architecture parameters for the Generator and the Discriminator (more on what these are in the next section),
- The loss function used during training to measure the difference between the output of the algorithm and the expected value,
- A validation sample that I’ll be able to listen to while the model is trained.
During training, I can see quality metrics, and I can listen to the validation sample selected above. Once the model has been fully trained, I can use it to generate compositions, just like for pretrained models.
A Primer on Generative Adversarial Networks
GANs saw the light of day in 2014, with the publication of “Generative Adversarial Networks” by Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville and Yoshua Bengio.
In the authors’ words:
In the proposed adversarial nets framework, the generative model is pitted against an adversary: a discriminative model that learns to determine whether a sample is from the model distribution or the data distribution. The generative model can be thought of as analogous to a team of counterfeiters, trying to produce fake currency and use it without detection, while the discriminative model is analogous to the police, trying to detect the counterfeit currency. Competition in this game drives both teams to improve their methods until the counterfeits are indistinguishable from the genuine articles.
Let me expand on this a bit:
- The Generator has no access to the data set. Using random data, it creates samples that are forwarded through the Discriminator model.
- The Discriminator is a binary classification model, learning how to recognize genuine data samples (included in the training set) from fake samples (made up by the Generator). The training process uses traditional techniques like gradient descent, back propagation, etc.
- As the Discriminator learns, its weights are updated.
- The same updates are applied to the Generator. This is the key to understanding GANs: by applying these updates, the Generator progressively learns how to generate samples that are closer and closer to the ones that the Discriminator considers as genuine.
To sum things up, you have to train as a counterfeiting expert in order to become a great counterfeiter… but don’t take this as career advice! If you’re curious to learn more, you may like this post from my own blog, explaining how to generate MNIST samples with an Apache MXNet GAN.
If you just want to play music and have fun like this little fellow, that’s fine too!- Julien