Sunday, 26 September 2010

womrad live blog part the last

last session: Long tail stuff:

Kibeom Lee (presenting), Woon Seung Yeo and Kyogu Lee
  • focusing on popularity bias - referencing oscar's thesis work (Help! I'm stuck in the head)
  • Goal: keep the awesome of collaborative filtering but sort out popularity bias
  • the mystery of unpopular but 'loved' songs on -- shouldn't loved songs be played frequently... perhaps an area of music the user likes but doesn't venture very far into
  • 'My tail is your head' - find the users who have a 'head' that overlaps with your 'tail' to draw recs from
  • personal story about how this idea came about -- one person's popularity bias is another person's novel rec.
  • refs oscar and paul's ISMIR 07 rec tutorial - this system is geared toward the top half of the user type pyramid
  • scraped to get more tracks per user (API gives 50/user scrape gives 500)
  • lots of tracks (about 9million)
  • eval by asking users how things worked out comparing recs from proposed algor v. trad model rate; used a 1-5 rating scale
  • promo'd the website in various ways, but not too much response
  • but, the limited response did show some improvement over traditional approach
  • overall - some improvement, much potential
Q how many users?: see above
Q so were your recs in the global head?:
sorta, mostly in the midsection

Mark Levy (presenting) and Klaas Bosteels
  • an overview of lit showing various rec bias especially the idea of positive feedback reinforcing the head (not this kind of bias though)
  • this work looks at 7 billion scrobbles all scrobbles from Jan - Mar this year (holy crap, that's some scale)
  • recs just from the radio
  • how do you define the long tail? use a fixed ref of overall artist ranks (number of listeners from last) + a fit model ~50-60k artists in the 'head'
  • looked at rec radio, non-rec radio, all music
  • the radio has less head bias then general listening, but only just
  • used an experimental cohort of listeners: new, active, but not insane spamming amounts of scrobbling. two subsets : radio users and not so much
  • this shows very little difference in the non-radio long tail listening among those who use radio v. those who don't
  • but: perhaps there's some demographic trouble
  • so split radio users into high users and low users
  • still no tail bias to speak of
  • perhaps from the fact that real systems only rec new tracks, mitigating reinforcement
  • so: built a simple item-based rec which limited candidates to the 'play direct-from-artist' scheme, not allowed to give artists with more than 10000 fans
  • deployed on
  • eval based on a sample of the user traffic
  • effectively pushes curve out another order of magnitude
  • try online
  • [me: this is great!]
Q Do you see a problem, in terms of scholarship, with the fact that in practice you have access to all this data and the public does not?
well, hrm. how about being an intern
Q Does this make better recs?
Better, eh, interesting sure.

And WOMRAD done. feedback is elicited

No comments: