AWS Machine Learning Blog

Creating a music genre model with your own data in AWS DeepComposer

AWS DeepComposer is an educational AWS service that teaches generative AI and uses Generative Adversarial Networks (GANs) to transform a melody that you provide into a completely original song. With AWS DeepComposer, you can use one of the pre-trained music genre models (such as Jazz, Rock, Pop, Symphony, or Jonathan-Coulton) or train your own. As a part of training your custom music genre model, you store your music data files in NumPy objects. This post accompanies the training steps in Lab 2 – Train a custom GAN model on GitHub and demonstrates how to convert your MIDI files to the proper training format for AWS DeepComposer.

For this use case, you use your own MIDI files to train a Reggae music genre model. Reggae music comes from the island of Jamaica and typically uses bass guitars, drums, and percussion instruments. However, the steps in this post are general enough to work with any music genre.

Data processing to produce training data

MIDI (.mid) files are the beginning state for your training data. Software produces (and reads) the files, which include data about musical notes and playback sound. As a part of data processing, you need to translate your MIDI files to NumPy arrays and persist them to disk in a single .npy file. The following diagram shows the conversion process.

Although .csv files have widespread usage in machine learning to store data, .npy files are highly optimized for faster reading during the training process. The final shape on the .npy file should be (x, 32, 128, 4), which represents (number of samples, number of time steps per sample, pitch range, instruments).

To convert your MIDI files to the proper format, you complete the following steps:

  1. Read in the MIDI file to produce a Multitrack object.
  2. Determine which four instrument tracks to use.
  3. Retrieve the pianoroll matrix for each track and reshape it to the proper format.
  4. Concatenate the pianoroll objects for a given instrument and store them in the .npy file.

Reading in the MIDI file to produce a Multitrack object

The first step in data processing is to parse each MIDI file to produce a Multitrack object. The following diagram shows a Multitrack object.

The library that assists with this process is Pypianoroll, which provides functionality to read and write MIDI files from within Python code. See the following code:

#init with beat resolution of 4 

music_tracks = pypianoroll.Multitrack(beat_resolution=4)
#Load MIDI file using parse_midi
#returns Multitrack object containing Track objects

music_tracks.parse_midi(your_midi_file_directory + your_midi_filename)

music_tracks is a Mulitrack object that holds a list of Track objects read from your MIDI file. Each Mulitrack object contains a tempo, downbeat, beat resolution, and name, as shown in the following code:

tracks: [FRETLSSS, ORGAN 2, CLAVINET, MUTED GTR, CLEAN GTR, VIBRAPHONE, DRUMS],
tempo: [120. 120. 120. ... 120. 120. 120.],
downbeat: [ True False False False False False False False False . . .
 False False False False False False False False False False False False],
beat_resolution: 4,
name: “reggae1”

Determining which four instrument tracks to use

If your parsed Multitrack object contains exactly four instruments, you can skip this step.

The preceding parsed Multitrack object has a total of seven instrument tracks (fretless (electric bass), organ, clavinet, muted guitar, clean guitar, vibraphone, and drums). The more instruments the model has to learn from, the longer and more costly training becomes. Given this, your chosen GAN supports only four instruments. If the tracks in your MIDI file contain more than four instruments, train your model with the tracks from the four most popular instruments in your chosen genre. Conversely, you need to augment your MIDI files if they don’t have at least four instruments per track. The incorrect number of instruments causes NumPy shape errors.

Each instrument on a given track has a program number. The program number is dictated by the General MIDI specification and is like a unique identifier for the instrument. The following code example pulls out the piano, organ, bass, and guitar instruments using the associated program numbers:

instrument1_program_numbers = [1,2,3,4,5,6,7,8] #Piano
instrument2_program_numbers = [17,18,19,20,21,22,23,24] #Organ
instrument3_program_numbers = [33,34,35,36,37,38,39,40] #Bass
instrument4_program_numbers = [25,26,27,28,29,30,31,32] #Guitar
if track.program in instrument1_program_numbers: 
     collection['Piano'].append(track)
elif track.program in instrument2_program_numbers:
     collection['Organ'].append(track)
elif track.program in instrument3_program_numbers:
     collection['Bass'].append(track)
elif track.program in instrument4_program_numbers:
     collection['Guitar'].append(track)

Retrieving the pianoroll matrix for each track and reshaping it to the proper format

The Multitrack object has a single Track object per instrument. Each Track object contains a pianoroll matrix, a program number, a Boolean representing if the track is a drum, and a name.

The following code example shows a single Track:

pianoroll:
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]],
program: 7,
is_drum: False,
name: CLAVINET

For training, a single pianoroll object should have 32 discrete time steps that represent a snippet of a song and 128 pitches. The starting shapes on the pianoroll objects for the chosen instrument tracks are (512, 128), which you need to reshape to the proper format.  Each pianoroll object is reshaped to two bars (32 time steps) with a pitch of 128. After entering the following code , the final shape for a single pianoroll object is (16, 32, 128):

#loop through chosen tracks
for index, track in enumerate(chosen_tracks):  
     try:
          #reshape pianoroll to 2 bar (i.e. 32 time step) chunks  
          track.pianoroll = track.pianoroll.reshape( -1, 32, 128)
            
          #store reshaped pianoroll per instrument
          reshaped_piano_roll_dict = store_track(track, reshaped_piano_roll_dict)     
     except Exception as e: 
         print("ERROR!!!!!----> Skipping track # ", index, " with error ", e)

For brevity, the following code example shows a sample of what’s produced for piano tracks only:

{'Piano': [Track(pianoroll=array([[[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]]], dtype=uint8), program=7, is_drum=False, name=CLAVINET)]

Concatenating the pianoroll objects for a given instrument and storing them in the .npy file

The next step is to concatenate all the tracks, per instrument, and store them in the .npy training file. You can think of this process as stacking the pianoroll objects on top of one another. You need to repeat this process for each of your four chosen instruments. See the following code:

def get_merged(music_tracks, filename):

...

#will hold all merged instrument tracks 
merge_piano_roll_list = []
    
for instrument in reshaped_piano_roll_dict: 
     try:
          merged_pianorolls = np.empty(shape=(0,32,128))

          #concatenate/stack all tracks for a single instrument
          if len(reshaped_piano_roll_dict[instrument]) > 0:
               if reshaped_piano_roll_dict[instrument]:     
                    merged_pianorolls = np.stack([track.pianoroll for track in reshaped_piano_roll_dict[instrument]], -1)
                    
               merged_pianorolls = merged_pianorolls[:, :, :, 0] 
               merged_piano_rolls = np.any(merged_pianorolls, axis=0)
               merge_piano_roll_list.append(merged_piano_rolls)
     except Exception as e: 
          print("ERROR!!!!!----> Cannot concatenate/merge track for instrument", instrument, " with error ", e)
            
        
merge_piano_roll_list = np.stack([track for track in merge_piano_roll_list], -1)
return merge_piano_roll_list.reshape(-1,32,128,4)

You now store the merged piano rolls in the .npy file. See the following code:

#holds final reshaped tracks that will be saved to training .npy file
track_list = np.empty(shape=(0,32,128,4))

. . .

#merge pianoroll objects by instrument 
merged_tracks_to_add_to_training_file = get_merged(music_tracks, filename)

#concatenate merged pianoroll objects to final training data track list
track_list = np.concatenate((merged_tracks_to_add_to_training_file, track_list))

# binarize data
track_list[track_list == 0] = -1
track_list[track_list >= 0] = 1

#save the training data to reggae-train.npy
save(train_dir + '/reggae-train.npy', np.array(track_list))

Results

The code produces reggae-train.npy based on a set of MIDI files stored in your_midi_file_directory. The Jupyter notebook and full code are available on the GitHub repo.

Now that you have your training data file, follow Lab 2 – Train a custom GAN model in the AWS DeepComposer samples notebook to train your custom music genre model.

This post provides two AI-generated, reggae-inspired tracks on SoundCloud: Summer Breeze and Mellow Vibe.

Tips and tricks

You can use the following tips and tricks to understand your data and produce the best sounding music.

Viewing and listening to MIDI files in GarageBand

If you have a Mac, you can use GarageBand to listen to your MIDI files and view the accompanying instruments. If you don’t have a Mac, you can use any other Digital Audio Workstation (DAW) that supports MIDI files. The sound quality is much better when listening to AI-generated music through GarageBand. You can even attach pro-grade USB speakers to amplify the sound.

Using program numbers to change accompanying instruments

When running inference code from Lab 2 – Train a custom GAN model, you may notice that all the AI-generated tracks appear as “Steinway Grand Piano” in GarageBand. If you’re familiar with the AWS DeepComposer console, you can change the accompanying instruments. To change the accompanying instruments when training a custom model, use the programs parameter when calling the save_pianoroll_as_midi function found in midi_utils. See the following code:

#use programs to provide the program numbers for the instruments I care about
#17 = Drawbar Organ, 28 = Electric Guitar, 27 = Electric Guitar, 11 = Music Box
midi_utils.save_pianoroll_as_midi(fake_sample_x[:4], programs=[17, 28, 27, 11], destination_path=destination_path)

Using GarageBand to add additional accompaniments

When you have an AI-generated song (with accompaniments), you can use GarageBand (or a similar tool) to add additional accompaniments. You can adjust the tempo (or speed) of your track and even mute certain instruments. You can also create a unique sound by adding as many accompanying instruments as you’d like.

Creating inference melody on the AWS DeepComposer console

When running inference, you need a custom melody in MIDI format. You use your custom melody to generate a unique song by adding accompanying instruments. The easiest way for you to create a melody when training a custom model is to use the AWS DeepComposer console. You can record your melody using the virtual keyboard or an AWS DeepComposer keyboard and download it as a MIDI file by choosing Download.

Plotting pianoroll using matplotlib

You can plot a pianoroll object using the plot function on the Track. This gives you a visual representation of your pianoroll object. See the following code:

import matplotlib.pyplot as plt

...

fig, ax = track.plot()
plt.show()

The following plot shows what a pianoroll object looks like.

Binarizing the data

The code contains a section to binarize the data. This update is important because the model operates on -1 and 1 instead of 0 and 1 when dealing with binary input. track_list contains the final training data, which you should set to either -1 or 1 before persisting to reggae-train.npy. See the following code:

# binarize data
track_list[track_list == 0] = -1
track_list[track_list >= 0] = 1

Conclusion

AWS DeepComposer is more than an ordinary keyboard, it’s a fun and engaging way to learn about generative AI and the complexities of GANs. You can even learn to play simple melodies that might serve as the inspiration for brand-new songs. The ability to train your own custom music genre model allows you to create a sound that is totally unique. You can even fuse two genres to create a brand-new genre!

Look out for my upcoming free web series, “AWS Deep Composer: Train it Again Maestro,” on A Cloud Guru. This six-episode web series teaches you about machine learning, music terminology, and how to generate songs using generative AI and AWS DeepComposer. The series wraps up with a head-to-head “Battle of the DeepComposers,” in which my generated AI-song goes up against another instructor’s song and you vote for your favorite. Check it out!

You can also follow me on SoundCloud to hear my AI-generated music inspired by my favorite artists or connect with me on LinkedIn.


About the Author

Kesha Williams is an AWS Machine Learning Hero, Alexa Champion, award-winning software engineer, and training architect at A Cloud Guru with 25 years’ experience in IT.