Tagatune Dataset

15 Mar

Shameless copy and paste from

The Tagatune Dataset

Edith L.M. Law has just released the long-awaited Tagatune Dataset.

From the README:

The Tagatune dataset consist of 31383 music clips that are 29 seconds long, created from songs downloadable from The genres include classical, new age, electronica, rock, pop, world, jazz, blues, metal, punk etc. The dataset is optimized for training machine learning algorithms ? i.e. it includes tags that are associated with more than fifty songs, and each song is associated with a tag only if that tag has been generated by more than two players independently.

The data is collected from a two-player online game called Tagatune, deployed on the game portal. In this game, two players are given either the same song or different songs, and are asked to enter descriptions appropriate for their given song. After reviewing each other?s descriptions, the players then guess whether they are given the same song or not.

This is great data, useful for all sorts of things, especially research around autotagging and query-by-description. It is quite complimentary to a dataset that we are about to release from the Echo Nest (stay tuned for that).

Leave a comment

Posted by on March 15, 2009 in Music, Programlama


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: