The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.
Its purposes are:
- To encourage research on algorithms that scale to commercial sizes
- To provide a reference dataset for evaluating research
- As a shortcut alternative to creating a large dataset with The Echo Nest's API
- To help new researchers get started in the MIR field
Most of the information is provided by The Echo Nest. The dataset is the result of a collaboration between The Echo Nest and LabROSA at Columbia University, supported in part by the NSF. This data was uploaded by Infochimps. The Millon Songs collection can also be found cataloged from A-Z on Infochimps. Search “Million Song Dataset” or click here.