Friday, 29 June 2012

Licensed listening based on the habits of pirates or lessons from sloppy item resolution

from flickr user nozoomii, CC by-nc-sa
So I spent Thursday and Friday a couple weeks ago at the Barcelona Music Hackday, part of the Sónar Music Festival. There were loads of excellent hacks (full list), including my own, Legalize It!. In this post I'm going to go into a bit more depth about the hack, lessons learned and teasers for things I might do.


The core idea is a simple one – straightforward listening to things that are popular on Bittorrent (note that popular on Bittorrent is a slightly fuzzy concept, since Bittorrent is protocol for ad-hoc distribution, but we'll get back to that in a bit), without all the nastiness (and DNS blocking!) of looking at, say, The Pirate Bay's top music torrents (that's a proxy of tpb btw).  And of course this removes any legal trouble that would be associated with gather and listening to music via those torrent charts.

How it all works

tl;dr - It's a torrent chart metadata-based content resolver, written in python and JS, you can fork the code.

Legalize It! has two parts, client and server. The server is a fairly simple pyramid webserver with two main tasks (it's deployed on heroku). The first involves fetching the torrent charts and resolving torrent release groups to legal streaming albums (Spotify, currently).  This is simply a matter of fetching the daily torrent releasegroup chart from the Musicmetric API (full disclosure, they're my employer and I wrote most of the chart endpoints...) then walking through the top N items that look like albums and matching them to spotify albums. The matching is done through a fantastically naive string title + artist search on spotify's metadata API via the very useful Spotimeta python wrapper. This album resolution process has a simple web interface (if it returns an error try a refresh, heroku workers on the free tier sleep a bit too much), that I mostly built for testing, but can be quite useful without the commitment of installing the Spotify App. In addition to the human readable page, you can get the response back as JSON, which is handy on the client side.

The second job for the server is necessary to help select which songs from the top albums we'll be listening to. The final Spotify app will only select on song per album, so users can get a taste of every album in the top N without having to listen to N complete albums.  But this should be done with care and grace to enhance the listening experience.  Thankfully, with the assistance of The Echo Nest's searchable audio summary song-level features this is quite easy.  Using their API we can search for a song by title and artist name (see a pattern of terrible id matching forming? What can I say, it's a hack.) and get back a set of descriptors that looks like this:

Once we have the set of song-level descriptors for every song in the neighbouring albums, it's simply a matter of minimizing the step size.  I've done a bit of work on playlists before, and this seemed like a reasonable approach.  While there are a number of approaches to step-size minimization, for this particular application we're doing a greedy sort of optimization that goes something like:

  1.  Select the song a in album A and the song b in album B, such that for a given audio descriptor (we'll use dancibility by default, but it could just as easily be a different measure, eg. tempo, loudness) the absolute value of the descriptor of a less the descriptor of b is minimized.  That is to say we're looking for the song-pair from these two albums that are closest in terms of whatever descriptor is being used.
  2. Select song c from album C such that the absolute difference from it's audio descriptor to song b's  is similarly minimized.
  3. Repeat (2) for remaining albums, using the last chosen song against the next album.
It's worth noting that this is not the globally optimal shortest path from the first album to the last album, but it comes with a tremendous advantage -- the first two songs are selected without any need to deal with the rest of the playlist, which can be worked on overtime.  This allows for pseudo real time playlist creation, since we just need to know the next song before the current one is done playing.

To facilitate this algorithm the server has a hook that performs step (1) or (2) on on either a pair of albums or a song and album (specified as spotify URIs) can be accessed via a url that looks like[spotify URI of track or album]/[spotify URI of album]
For example, which returns a bit of JSON showing the closest (by dancability) song pair between My Beautiful Dark Twisted Fantasy by Kanye West and Up All Night by One Direction.  This method also supports using any other audio summary feature being used for distance by adding it to the end of the URI.  For example find the closest song pair from the same two records, but by tempo rather than by dancebility.

These two features are laced together in the client, which is a simple spotify app, that's basically just a small bit of jQuery that grabs the list of the top 25 albums then asynchronously gets the selected track for each album updating the playlist as each track is selected.

The Legalize It! app, as demo'd at the Music Hack Day
If you'd like to install the App, here are some instructions for installation in developer mode.  I'll be cleaning up the UI to conform to Spotify's app guidelines and submitting the app to their App finder thing, but that will take a while.

Going Forward

While this is mostly complete for what it does there are a number of feature adds that I'll be slowly dealing with as time goes on. Highest among these is taking the same idea and applying it to different resolvers (maybe use tomahawk rather than picking one...). There are also some smaller feature adds to the app, things like autoplay once the first track has loaded and switching between different echonest summary features or musicmetric charts (eg. P2P release groups by acceleration).

Also, thanks very much to Spotify, who awarded me a prize for my hack!

So what do you think?  Should we care about the taste of a bunch of peers on Bittorrent? Or am I doing it wrong?