Saturday, 29 May 2010

publications on playlists in ISMIR

So for this year's ISMIR I'll be doing another tutorial. This one is entitled "Finding A Path Through The Jukebox – The Playlist Tutorial" and I'll be presenting it with Paul Lamere. As you may have guessed by the title it's all about playlists. So to frame some of my background work I thought I'd poke around the ISMIR proceedings to get a more complete idea of all of the papers that dealt with topic across the 10 years of proceedings (plus the just announced titles from this year).

First I did a simple title search using the tool at ismir.net. This shows that from 2000 - 2009 there were 14 papers with 'playlist' occurring somewhere in the title. Here they are over time:


Well, that doesn't show very much, just some interest, no trends or anything. So from there I took a look at the results of at the text search available from Rainer Typke's website. The full text search found some 123 papers mentioning playlist, certainly a few more than the title search. From there I wanted to see what the distribution of these papers was over time (as above), though this took a bit more work, as I couldn't sort out a means to export the search results... Anyway after a bit of counting I got this:

Well, now we're getting somewhere! Clearly there's an increasing number of papers discussing playlists at ISMIR. But wait, you say, this doesn't take into account the considerable expansion of the size of the conference over it's existence. So we can normalize to the number of papers per year that are known the the Cumulative ISMIR proceedings ( [35, 43, 62, 56, 108, 119, 99, 131, 111, 148] from 2000 - 2009 if anyone is interested). Below you can see both the title only and full paper search results normalized to the total number of papers:

The normalization didn't seem to change the trend much. But this leaves me wondering, what can be drawn from the the massive (and growing) disparity between title mentions and fulltext mentions? Obviously one would expect a higher number of hits, but a tenfold increase, seems very large. My first suspicion is that a great deal of this disparity comes from the fact that many papers at ISMIR that mention playlists are actually about something else (music similarity for instance) and then throw on a playlist as something of an afterthought. Perhaps this is an implicit acknowledgment of the great human-factor power of the playlist (as discussed in for instance this paper) or perhaps it's something else entirely.

Regardless of these finer points, it's clearly fair to say that there is a great deal of interest in playlist generation and analysis. If you're interested in these things, why not sign up for our tutorial?

Sunday, 21 March 2010

facelift

Hi blog readers. You may notice a slight change of scenery and the addition of some links just above the main text body. The links go the the other parts of my homepage and the color scheme shift is to keep everything consistent. I'm not much of a designer so I'll happily take any critique of the color scheme and such...

Monday, 1 March 2010

IEEE-THEMES --shameless self promotion--

I'm going to be presenting work at IEEE-THEMES, a workshop collocated with ICASSP, on March 15th in Dallas, TX. The talk is associated with an article to be published in the august issue of Select Topics in Signal Processing, which is a special issue on signal processing and social networks. Here's the title/abstract (note: link is to a preprint, camera-ready isn't due till after the talk so paper may well change a touch...) :


Abstract—This paper presents an extensive analysis of a sample of a social network of musicians. The network sample is first analyzed using standard complex network techniques to verify that it has similar properties to other web-derived complex networks. Content-based pairwise dissimilarity values between the musical data associated with the network sample are computed, and the relation- ship between those content-based distances and distances from network theory explored. Following this exploration, hybrid graphs and distance measures are constructed, and used to examine the community structure of the artist network. Finally, results of these investigations are presented and considered in the light of recommendation and discovery applications with these hybrid measures as their basis.
The paper mostly covers content that has been discussed elsewhere (much of it with Kurt Jacobson) refactored for a broader audience and with wider narratives in mind. That said there are some notable new findings in the paper as well. We have run another acoustic dissimilarity measure across the entire set (the 2009 MIREX entry in audio music similarity using marsyas) which for the most part confirms our earlier findings (that acoustic similarity and social similarity [mostly] aren't linearly correlated and that community genre labeling becomes more homogeneous [again, mostly] when using the audio sim as a weight). Additionally, we have broadened our comparison metrics to include an examination of the mutual information between the different dissimilarity sets. This also basically confirms our earlier findings, though mutual information provides a very satisfying level of nuance that is not possible from simply testing (using Pearsons) for linear correlation, especially given that our data is quite far from a normal distribution. So, if you're planning to be at ICASSP, I'd highly recommend IEEE-THEMES (the rest of the program looks to be very interesting as well...) and if you aren't going to be in Dallas, there are a few options for you.
  1. If you're in London right now, you can come to Goldsmiths today at 4pm to rm 144 in the main building, where I'll be giving a trail run of the talk.
  2. Slides (and perhaps some video) will be made available at some point (probably just after the talk is given).
  3. IEEE is running a pay-to-watch live stream of THEMES, so there's that as well.
Generally, if you're going to be in Dallas fr0m March 15-19, much discussion can happen in person. Also, between now and then I'll be doing some traveling (tomorrow till 6 March I'll be at UIUC, then from there till the 14th of March I'll be in San Diego) so if any readers are interested in some in person discussion and our locations overlap, let me know and perhaps something can be arranged.

Thursday, 11 February 2010

scipy and numpy from source, revisted

A while back I posted some instructions for getting scipy and numpy mostly up and running from current svn checkouts under python 2.6 with mac os x 10.5.8. I updated to 10.6 sometime back and have been using the preinstalled version of numpy (1.2) for my array needs without any scipy with solid results. However, I needed to get at some scipy functionality (doing some mutual information analysis via pyentropy) so I thought I'd give the process a go with the newer OS version. I'm pleased to report that everything works and was relatively easy to install/build. Basically the old instructions still hold with a couple points.

  1. It is necessary to update to a newer version of numpy, that you compile using the same fortran compiler you'll use with scipy.
  2. If you're using the build of macpython that comes with 10.6 (which is py2.6) you'll need to add the option --install-lib=/Library/Python/2.6/site-packages/ to any commands using distutil to install (eg. setup.py install)
And that's about it. I used fresh check outs of scipy (r6233, v0.8.0.dev) and numpy(r8106, v1.5.0.dev), but the same versions I've had for a while of Sparse and gFortran (the details of which are in the old post). As bonus this seems to result in less unittest failures in scipy (now only 10!) for whatever that's worth.

Wednesday, 27 January 2010

MusicHackday: Stockholm

So in a touch more than 48hrs I'll be hoping on a plane to go the Stockholm MusicHackday. It should be excellent, if the last one I went to is any judge. I'll be joined by fellow ISMS member Mike Jewell. The hack is being formulated, but may involve The World Bank's api and some yet to be determined sources of listener statistics. Also, somehow the echonest's api will be involved because I need to leave stockholm with one of these. We may need some further assistance to get something done in 24hrs, so if you're going to be at the hack and are looking for some folk to hack with drop a line in the comments...

A bit about playlists and similarity

Sorry about the general radio silence of late. Many things going on, most of them interesting.
Lately I've been spending quite a bit of time considering various aspects of playlist generation and how they all fit together. Here are some of my lines of thought:
  1. Evaluation of a playlist. How? Along which dimension? (Good v. Bad, Appropriate v. Offensive, Interesting v. Boring)
  2. How do people in various functions create playlists? How does this process and its output compare to common (or state of the art) methods employed in automatic playlist construction. This is to say, are we doing it right? Are the correct questions even being asked?
  3. What is the relationship between notions of music similarity (or pairwise relationship in the generic) and playlist construction?

While all these ideas are interrelated, for now I'm going to pick at point (3) a bit. I'm coming to believe this is central in understanding the other two points as well, at least to an extent. There are many ways to consider how two songs are related. In music informatics this similarity is almost always content-based, even if it isn't content derived. This can include methods based on timbral or harmonic features or most tags or similar labels (though these sometimes get away from content descriptors). This paints some kind of picture but leaves out something that can be critical to manual playlist construct as it is commonly understood (e.g. in radio or the creation of a 'mixtape'), socio-cultural context. In order to have the widest array of possible playlist constructions, it is necessary to have as complete an understanding of the relationship between member songs (not just neighbors...). Put another way, the complexity of your playlist is maximally bound by the complexity of your similarity measure.
M<=Cs
Where M is some not yet existant measure of the possible semantic complexity of a playlist and s is a similar measure of the semantic complexity of the similarity measure used in the construction of that playlist. C is our fudge factor constant. Now, obviously there are plenty of situations where complex structure isn't required. But if the goal is to make playlists for a wide range of functions and settings, it will be required some times.

In practice what this means is that you can make a bag of songs from a bag of features. However, imparting long form structure is at a minimum dependant on a much more complex understanding of the relationships (eg. sim) between songs (say from social networks or radio logs...)

Anyway, this is all a bit vague right now. I'm working on some better formalization, we'll see how that goes. Anyone have any thoughts?


Wednesday, 19 August 2009

Compiling bleeding edge SciPy on Mac OS X

I do most of my number crunching computing task with SciPy these days, having basically kicked the matlab habit with the brief exception of occasional use of legacy libraries. SciPy is a joy to work with, but is a huge pain to build from source, in light of nasty dependancies (fortran things mostly) and some system specific hardware acceleration trickiness. Thankfully most users can download one of many prebuilt packages, perhaps the best being enthought's. If you've ever wanted to see what SciPy is all about, this is the easiest way to do so.

That said, one of the great things about using SciPy instead of matlab is that it's python. Except all the prebuilt binaries (to my knowledge anyway) use at newest python 2.5. Again, probably not a problem for most, but I use the nice socket library (amongst other things) that's been improved significantly in python 2.6. So for a while I had my SciPy python and my everything else python and every so often I would make another attempt building SciPy for py2.6 on my mac to integrate the two and every time it would defeat me.

Until yesterday.

So I'm going to attempt to fully document the process as I've now done it on 2 similar machines (home & lab) and now that I've figured out the tricky bits it seems fairly easy to reproduce. These instruction were followed on 1 - 2 year old intel based macs running 10.5.8. (Note these instructions don't touch on installing the other pieces of standard SciPy setup, ipython and pylab/matplotlib as I've never had much trouble getting these to build. I believe the easy_install process works for both, mostly)

  1. (optional) If you don't want to build universal python modules remove "-arch ppc -arch i386" from the BASECFLAGS and the LDFLAGS in the python library Makefile, which should live somewhere around here: /Library/Frameworks/Python.framework/Versions/Current/lib/python2.6/config/Makefile
  2. If you don't already have xCode 3.1.3 and the associated developer tools you need to get it for apple's custom build of gcc 4.2 (it's not the version that comes with most box copies of 10.5). Download and install a fresh copy of the Apple Developer Tools. You can get SciPy to compile with other variants of gcc 4.2 or greater (from MacPorts for instance) but they don't support apple specific options, which are very helpful in other situations.
  3. Download and install gFortran as a patch to the apple gcc from att research. Why apple doesn't leave gfortran in gcc I don't know, but they don't and we need it. It's critical you use this fortran compiler as other variants of gfortran or g77 seem to cause errors.
  4. Download and install UMFPACK and AMD from SuiteSparse. The easiest way I've gotten through this is to download the entire SuiteSparse and then do the following:
    1. Modifiy the package wide config makefile found at SuiteSparse/UFconfig/UFconfig.mk by uncommenting the Macintosh options (currently lines 299 - 303)
    2. In order to only compile the 2 packages we also need to modify the high level makefile (SuiteSparse/Makefile) by commenting out the references to the other packages under the default call (currently lines 10, 12-17, 19-24).
    3. run make while in the SuiteSparse dir
    4. because it would be too easy if SuiteSparse actually had an install routine, we have to install the just compiled libs ourselves. This is how I did it, though you can stick all these bits wherever you like as long as the python compiler and linker will see them:
      $sudo install UMFPACK/Include/* /usr/local/include/
      $sudo install AMD/Include/* /usr/local/include/
      $sudo install UMFPACK/Lib/* /usr/local/lib/
      $sudo install AMD/Lib/* /usr/local/lib/
      $sudo install UFconfig/UFconfig.h /usr/local/include/
  5. Grab a bleeding edge copy of SciPy and NumPy via their subversion repositories.:
    $svn co http://svn.scipy.org/svn/numpy/trunk numpy-from-svn
    $svn co http://svn.scipy.org/svn/scipy/trunk scipy-from-svn
  6. Build and install NumPy:
    $cd numpy-from-svn
    $sudo python setup.py build --fcompiler=gnu95 install
  7. Test NumPy to make sure it's not broken (note that the tests need to be run out of the build directory):
    $cd ..
    $python -c "import numpy;numpy.test()"

    Make sure numpy doesn't fail any of the tests (known fails and skips are okay) or the next bit may not work.
  8. Similar to step 6, build and install SciPy:
    $cd scipy-from-svn
    $sudo python setup.py config_fc --fcompiler=gfortran install
  9. similar to step 7, move out of the build directory and run the built in tests:
    $cd ..
    $python -c "import scipy;scipy.test()"

    You're going to get some fails and maybe some errors. You're going to have to use your own judgement to as to whether these errors and fails are substantial. Most of the troubles I've encountered are trivial, things like a type being dtype('int32') instead of 'int32' which is actually the same and just needs to be updated to reflect newer numpy.
And now you have a nice SciPy build for whatever flavor of python you're working with on you Mac. Note that I have no idea how well this will work in anything other than python 2.6 on Mac OSX 10.5.8, though it will probably mostly work with other variants. Also, for completeness, I most recently compiled these versions: NumPy-r7303, SciPy-r5893. At some point I'm going to give it a go with python 3.x but that will be a whole new kind of pain I suspect. Anyway, if anyone uses these instructions and they don't quite work or you don't understand part of it, please let me know and I'll try to clarify or help as best I can. I'd really love to build a definitive set of instructions for building SciPy on a Mac, but I can only verify these instructions on my machines.