Stuff. Also, things.: womrad live blog part the last

Sunday, 26 September 2010

last session: Long tail stuff:

Kibeom Lee (presenting), Woon Seung Yeo and Kyogu Lee

focusing on popularity bias - referencing oscar's thesis work (Help! I'm stuck in the head)
Goal: keep the awesome of collaborative filtering but sort out popularity bias
the mystery of unpopular but 'loved' songs on last.fm -- shouldn't loved songs be played frequently... perhaps an area of music the user likes but doesn't venture very far into
'My tail is your head' - find the users who have a 'head' that overlaps with your 'tail' to draw recs from
personal story about how this idea came about -- one person's popularity bias is another person's novel rec.
refs oscar and paul's ISMIR 07 rec tutorial - this system is geared toward the top half of the user type pyramid
scraped last.fm to get more tracks per user (API gives 50/user scrape gives 500)
lots of tracks (about 9million)
eval by asking users how things worked out comparing recs from proposed algor v. trad model rate; used a 1-5 rating scale
promo'd the website in various ways, but not too much response
but, the limited response did show some improvement over traditional approach
overall - some improvement, much potential

Q how many users?: see above

Q so were your recs in the global head?:

sorta, mostly in the midsection

Mark Levy (presenting) and Klaas Bosteels

an overview of lit showing various rec bias especially the idea of positive feedback reinforcing the head (not this kind of bias though)
this work looks at 7 billion scrobbles all scrobbles from Jan - Mar this year (holy crap, that's some scale)
recs just from the last.fm radio
how do you define the long tail? use a fixed ref of overall artist ranks (number of listeners from last) + a fit model ~50-60k artists in the 'head'
looked at rec radio, non-rec radio, all music
the last.fm radio has less head bias then general listening, but only just
used an experimental cohort of listeners: new, active, but not insane spamming amounts of scrobbling. two subsets : radio users and not so much
this shows very little difference in the non-radio long tail listening among those who use last.fm radio v. those who don't
but: perhaps there's some demographic trouble
so split radio users into high users and low users
still no tail bias to speak of
perhaps from the fact that real systems only rec new tracks, mitigating reinforcement
so: built a simple item-based rec which limited candidates to the 'play direct-from-artist' scheme, not allowed to give artists with more than 10000 fans
deployed on playground.last.fm
eval based on a sample of the last.fm user traffic
effectively pushes curve out another order of magnitude
try online
[me: this is great!]

Q Do you see a problem, in terms of scholarship, with the fact that in practice you have access to all this data and the public does not?

well, hrm. how about being an intern

Q Does this make better recs?

Better, eh, interesting sure.

Sunday, 26 September 2010