AWS Machine Learning Blog
Creating a music genre model with your own data in AWS DeepComposer
AWS DeepComposer is an educational AWS service that teaches generative AI and uses Generative Adversarial Networks (GANs) to transform a melody that you provide into a completely original song. With AWS DeepComposer, you can use one of the pre-trained music genre models (such as Jazz, Rock, Pop, Symphony, or Jonathan-Coulton) or train your own. As a part of training your custom music genre model, you store your music data files in NumPy objects. This post accompanies the training steps in Lab 2 – Train a custom GAN model on GitHub and demonstrates how to convert your MIDI files to the proper training format for AWS DeepComposer.
For this use case, you use your own MIDI files to train a Reggae music genre model. Reggae music comes from the island of Jamaica and typically uses bass guitars, drums, and percussion instruments. However, the steps in this post are general enough to work with any music genre.
Data processing to produce training data
MIDI (.mid) files are the beginning state for your training data. Software produces (and reads) the files, which include data about musical notes and playback sound. As a part of data processing, you need to translate your MIDI files to NumPy arrays and persist them to disk in a single .npy file. The following diagram shows the conversion process.
Although .csv files have widespread usage in machine learning to store data, .npy files are highly optimized for faster reading during the training process. The final shape on the .npy file should be (x, 32, 128, 4)
, which represents (number of samples, number of time steps per sample, pitch range, instruments)
.
To convert your MIDI files to the proper format, you complete the following steps:
- Read in the MIDI file to produce a
Multitrack
object. - Determine which four instrument tracks to use.
- Retrieve the
pianoroll
matrix for each track and reshape it to the proper format. - Concatenate the pianoroll objects for a given instrument and store them in the .npy file.
Reading in the MIDI file to produce a Multitrack object
The first step in data processing is to parse each MIDI file to produce a Multitrack object. The following diagram shows a Multitrack object.
The library that assists with this process is Pypianoroll, which provides functionality to read and write MIDI files from within Python code. See the following code:
music_tracks
is a Mulitrack
object that holds a list of Track objects read from your MIDI file. Each Mulitrack
object contains a tempo, downbeat, beat resolution, and name, as shown in the following code:
Determining which four instrument tracks to use
If your parsed Multitrack
object contains exactly four instruments, you can skip this step.
The preceding parsed Multitrack
object has a total of seven instrument tracks (fretless (electric bass), organ, clavinet, muted guitar, clean guitar, vibraphone, and drums). The more instruments the model has to learn from, the longer and more costly training becomes. Given this, your chosen GAN supports only four instruments. If the tracks in your MIDI file contain more than four instruments, train your model with the tracks from the four most popular instruments in your chosen genre. Conversely, you need to augment your MIDI files if they don’t have at least four instruments per track. The incorrect number of instruments causes NumPy shape errors.
Each instrument on a given track has a program number. The program number is dictated by the General MIDI specification and is like a unique identifier for the instrument. The following code example pulls out the piano, organ, bass, and guitar instruments using the associated program numbers:
Retrieving the pianoroll matrix for each track and reshaping it to the proper format
The Multitrack object has a single Track object per instrument. Each Track object contains a pianoroll matrix, a program number, a Boolean representing if the track is a drum, and a name.
The following code example shows a single Track:
For training, a single pianoroll
object should have 32 discrete time steps that represent a snippet of a song and 128 pitches. The starting shapes on the pianoroll
objects for the chosen instrument tracks are (512, 128), which you need to reshape to the proper format. Each pianoroll
object is reshaped to two bars (32 time steps) with a pitch of 128. After entering the following code , the final shape for a single pianoroll
object is (16, 32, 128)
:
For brevity, the following code example shows a sample of what’s produced for piano tracks only:
Concatenating the pianoroll objects for a given instrument and storing them in the .npy file
The next step is to concatenate all the tracks, per instrument, and store them in the .npy training file. You can think of this process as stacking the pianoroll
objects on top of one another. You need to repeat this process for each of your four chosen instruments. See the following code:
You now store the merged piano rolls in the .npy file. See the following code:
Results
The code produces reggae-train.npy
based on a set of MIDI files stored in your_midi_file_directory. The Jupyter notebook and full code are available on the GitHub repo.
Now that you have your training data file, follow Lab 2 – Train a custom GAN model in the AWS DeepComposer samples notebook to train your custom music genre model.
This post provides two AI-generated, reggae-inspired tracks on SoundCloud: Summer Breeze and Mellow Vibe.
Tips and tricks
You can use the following tips and tricks to understand your data and produce the best sounding music.
Viewing and listening to MIDI files in GarageBand
If you have a Mac, you can use GarageBand to listen to your MIDI files and view the accompanying instruments. If you don’t have a Mac, you can use any other Digital Audio Workstation (DAW) that supports MIDI files. The sound quality is much better when listening to AI-generated music through GarageBand. You can even attach pro-grade USB speakers to amplify the sound.
Using program numbers to change accompanying instruments
When running inference code from Lab 2 – Train a custom GAN model, you may notice that all the AI-generated tracks appear as “Steinway Grand Piano” in GarageBand. If you’re familiar with the AWS DeepComposer console, you can change the accompanying instruments. To change the accompanying instruments when training a custom model, use the programs parameter when calling the save_pianoroll_as_midi
function found in midi_utils
. See the following code:
Using GarageBand to add additional accompaniments
When you have an AI-generated song (with accompaniments), you can use GarageBand (or a similar tool) to add additional accompaniments. You can adjust the tempo (or speed) of your track and even mute certain instruments. You can also create a unique sound by adding as many accompanying instruments as you’d like.
Creating inference melody on the AWS DeepComposer console
When running inference, you need a custom melody in MIDI format. You use your custom melody to generate a unique song by adding accompanying instruments. The easiest way for you to create a melody when training a custom model is to use the AWS DeepComposer console. You can record your melody using the virtual keyboard or an AWS DeepComposer keyboard and download it as a MIDI file by choosing Download.
Plotting pianoroll using matplotlib
You can plot a pianoroll
object using the plot function on the Track. This gives you a visual representation of your pianoroll
object. See the following code:
The following plot shows what a pianoroll
object looks like.
Binarizing the data
The code contains a section to binarize the data. This update is important because the model operates on -1 and 1 instead of 0 and 1 when dealing with binary input. track_list
contains the final training data, which you should set to either -1 or 1 before persisting to reggae-train.npy
. See the following code:
Conclusion
AWS DeepComposer is more than an ordinary keyboard, it’s a fun and engaging way to learn about generative AI and the complexities of GANs. You can even learn to play simple melodies that might serve as the inspiration for brand-new songs. The ability to train your own custom music genre model allows you to create a sound that is totally unique. You can even fuse two genres to create a brand-new genre!
Look out for my upcoming free web series, “AWS Deep Composer: Train it Again Maestro,” on A Cloud Guru. This six-episode web series teaches you about machine learning, music terminology, and how to generate songs using generative AI and AWS DeepComposer. The series wraps up with a head-to-head “Battle of the DeepComposers,” in which my generated AI-song goes up against another instructor’s song and you vote for your favorite. Check it out!
You can also follow me on SoundCloud to hear my AI-generated music inspired by my favorite artists or connect with me on LinkedIn.
About the Author
Kesha Williams is an AWS Machine Learning Hero, Alexa Champion, award-winning software engineer, and training architect at A Cloud Guru with 25 years’ experience in IT.