Wednesday 27 January 2010

A bit about playlists and similarity

Sorry about the general radio silence of late. Many things going on, most of them interesting.
Lately I've been spending quite a bit of time considering various aspects of playlist generation and how they all fit together. Here are some of my lines of thought:
  1. Evaluation of a playlist. How? Along which dimension? (Good v. Bad, Appropriate v. Offensive, Interesting v. Boring)
  2. How do people in various functions create playlists? How does this process and its output compare to common (or state of the art) methods employed in automatic playlist construction. This is to say, are we doing it right? Are the correct questions even being asked?
  3. What is the relationship between notions of music similarity (or pairwise relationship in the generic) and playlist construction?

While all these ideas are interrelated, for now I'm going to pick at point (3) a bit. I'm coming to believe this is central in understanding the other two points as well, at least to an extent. There are many ways to consider how two songs are related. In music informatics this similarity is almost always content-based, even if it isn't content derived. This can include methods based on timbral or harmonic features or most tags or similar labels (though these sometimes get away from content descriptors). This paints some kind of picture but leaves out something that can be critical to manual playlist construct as it is commonly understood (e.g. in radio or the creation of a 'mixtape'), socio-cultural context. In order to have the widest array of possible playlist constructions, it is necessary to have as complete an understanding of the relationship between member songs (not just neighbors...). Put another way, the complexity of your playlist is maximally bound by the complexity of your similarity measure.
M<=Cs
Where M is some not yet existant measure of the possible semantic complexity of a playlist and s is a similar measure of the semantic complexity of the similarity measure used in the construction of that playlist. C is our fudge factor constant. Now, obviously there are plenty of situations where complex structure isn't required. But if the goal is to make playlists for a wide range of functions and settings, it will be required some times.

In practice what this means is that you can make a bag of songs from a bag of features. However, imparting long form structure is at a minimum dependant on a much more complex understanding of the relationships (eg. sim) between songs (say from social networks or radio logs...)

Anyway, this is all a bit vague right now. I'm working on some better formalization, we'll see how that goes. Anyone have any thoughts?


3 comments:

Brian McFee said...

Ben, you've been reading my wishlist!

Here's my basic take on it:

It's useful to draw a distinction between having a baseline measure of similarity between songs, and an application-specific measure of similarity. Think of this as the difference between passive recommendation and active search.

A simple way to model this is with a markov chain. Your transition probabilities P(X2|X1=x) capture raw song similarity, and the emission probabilities P(F|X=x) capture the user-driven search aspects of creation (eg, "I want the song in position 1 to have high energy, and song 6 should be mellow"). Then you just run inference on the supplied observations, and out pops a mix.

Of course, this sweeps a lot of technical details under the rug, but I suspect something along these lines may be about the right level of abstraction for what you're talking about.

I've hacked together the first half of this type of system (code and description at http://arizonabayes.blogspot.com/2009/11/travelling-salesmix.html), but it's quite far from perfect.

ben said...

Gah! Your probability models could practically have come out of my log book! Yes Yes Yes.
Although I'm especially interested in encapsulating non-content based (i.e. social and cultural features) into these probability models. Which I don't see as incompatible as long as the notion of similarity is quantifiable.

Have you considered higher order (ie. more than 1 dimension) similarity measures? Or for that matter the really annoying fact that when you consider notions like 'flow' what really matters isn't song to song similarity but last played bit of song a (not necessarily the end) to first played bit of song b? In the context of you TSP solution this could be dealt with via asymmetrical weighted directed graphs.

On another related note that may be of interest, I'm poking around at training expectation models (probably tied to tags) as a means to evaluate playlist that are made via other means. (like weighted social graphs, say...)

Brian McFee said...

Hah, now you've been reading my log book!

1) Non-content features are totally workable in this framework, provided you have a principled way to integrate them. That was sort of the point of my paper at ismir last year. :)

2) Yes, edge-transitions are crucial, assuming that the user isn't skipping (which is a big assumption). It's certainly possible to encode asymmetric relations, but TSP probably isn't the right tool for the job because you lose metric approximation guarantees. A more robust way is to map each song to a set of states, with similarity extracted from specific regions: 2 states for just inbound/outbound, 3 if you want to include an overall description as well.

(Double-related, collecting training data for this type of system is what I was getting at with quiztape. I should really get back to that one...)

3) Your last point is definitely the killer here: evaluation be a harsh mistress. Adjacent-song evaluations are inherently dissatisfying, but I'm not sure what more you can do in a rigorous way.