Sunday 26 September 2010

A womrad live blog

I'm in Barcelona today for the Workshop on Music Recommendation and Discovery (WOMRAD). The theme is 'Is Music Recommendation Broken? How Can We Fix It?'
I'm giving a talk at 11am ( in about 2 hours ) and I'll be doing some (mostly) live updates about the program...

Update (10:05am):
UPDATE ( 28 Sept 2010, 11:52am): Michael has posted the slides to his talk.
  • The view from outside, as his industrial has used and observed recs
  • Been there since the beginning (which appears to be about 2000)
  • Recommenders must combine humans and machines
  • understand both content and listeners, transparency, embrace emosocio aspects, optimize trust
  • What is science? Must be falsifiable (Popper) or Solvable, reproducible puzzles (gah, missed name)
  • Puzzle - understand the listeners preferences -- foundations (ISMIR 2001 resolution) - testable reusable
  • Lots of metrics though (too many?) (do we need a metric for metrics?)
  • MIREX (summary of AMS task) (haha it's automated, tell that to andy and mert) - very acoustically focused, not exactly recommendation similarity != recommendation
  • use of statistical measures across datasets e.g. Netflix prize -- but what about discovery? -- Netflix produces better numbers but does it produce better recommendations?
  • More holistic measures -- survey users about trust and satisfaction (Swearingen & Sinha) -- may miss UI issues -- practical 'business' metrics -- bottomline measurements -- does this remove the science?
  • appreciated history of MIR (from a rec POV) will stick pic here -- currently hitting 'Wall of Good Recs' since recs don't suck it's no harder to test
  • easy to test for bad recs -- hard to test for good recs
  • What if the emerging problems (like UI and trust) are no longer measurable
  • Is user preference too variable and unstable to be useful?
  • from science to art?
  • 2 options:
  • 1: focus on unsolved MIR: better encoding of preference (more socio-cultural research)
  • What are the limits of the avg listener (hey it's our playlist survey!)--playlist turing tests, understand artist v. album v. tracks -- can we build tools/games to expand this
  • listener profile -- can you quantify the sonic v. social preference -- add relevance layers to search and retrieval
  • 2: adjourn to the Beach
  • Questions:
  • Mark Levy: Do you think you're too embarrassed about good engineering? What about controlled experiments by people like google/last? -- Move from science to engineering (this confuses me slightly ISMIR has alway been Engineering not Pure Science) It is fruitful but is it science.
  • Claudio: Can you speak a bit about your experience combining human knowledge vs. algorithms --- yes. what do you do with human knowledge? it's tricky. look for the ideal rec experience - sit around with your friends and play records: how do you scale that in a system? It's not about classification - humans are good at putting things together - train people to be qualitative assessors
  • Oscar: Since you used to be in college radio, how do you think this experience could inform playlist? Do you use playlisters? Well only a 1.5yr experience, but made me think about the groups of listeners. Name checks John Peel. What about presentation - In terms of what rovi does: Minimally we can stop making bad playlists: gives example then breaks - v. hard to differentiate btwn good - v. good - excellent
  • Me: what about bypassing order by selecting good sets:
  • (Eugenio Tacchini): how much is the expert transparency necessary? yes give justification but need to avoid the feeling of stereotyping, weird vague directions, not just look at this user but look at this part of this user.
  • tom butcher: Is music rec really a unique snowflake? - Every domain is unique. -- One thing: a bad user rec in music costs 2 minutes, a bad film rec costs you 2hrs music has a lower penalty cost for bad recs. Also diff in features will sonic features get you to pref, prob not in music [I think this is a think which may improve...]
(update 2 10:31am)
session 1:
Time Dependency
  • personal ex. showing diff between early day v. late night playlist
  • trying to link 2 concepts - Day- hour - (weather?) and Music track selection
  • few papers on this idea -- take things from Human Dynamics -- trying to enable playing music 'at the right moment' -- explore circular stats
  • Circular stats (eqs in paper at link) basically transform raw data by a perodicity (days, weeks)
  • Circular stats have analogous tools to trad stats - hyp tests for instance
  • Data for eval is full listening history of 992 unique last.fm users with artist/title + time of day (ToD) also got genre via track.getTopTags, keeping genre -- discarded users w/o enough data
  • scraped about half the data
  • attempt to make predictions - use two years of data to predict the ToD of play in next year
  • results: by day about 2.5x better than chance, by hour about 3-5x better than chance (move from half hour to hour tolerance doubles data
  • note that the figures are overall, some users are v. predictable in this way, some are not.
  • Concl: temporal patterns can be predicted - not just what but when. plugs the last.fm clocks
  • Q (dunno who asked): what about user to user offsets (eg. if a user gets up at 6 v 8am 8 am means something different)? Currently can't do this, need sensor data. Would be sweet if we could, though not tha tthe predictions are peruser, so this is to some effect already dealt with
  • Q (again, people say who you are): Method issue - when comp day v. hour there's a percentile diff in the err tolerance? Sure this could work look at baseline compare...
  • Q (Eugenio Tacchini): I tried this awhile ago, aggregated data, didn't find much spread do you think aggregation is the issue? yeah, must be specific to the user, right time + right user not just right user
  • Q (Klaas):do you think it would work with less data (can't wait 2 years)? Probably. This was a very conservative methodology, could probably get by with maybe three months. For this work we wanted lots of data to make things clear
  • Q (seriously ID guys): did you use a popularity filter? No. tested if pref for a genre is different than the average for that genre
Break time then my talk. no notes for my talk as I'm talking...

Update (12:16): I was without my machine for the social tag session, not just my talk. I'll get my hand notes in another post but for now here are the papers:
next paper is being skipped since the author was unable to attend due to illness:

Now joining the presentation already in progress by Audrey Laplante:
  • qualitative study of adolescents
  • 'Did your music taste change significantly in the last three years?' Yes, whys: New boyfriend, New school therefore new friends, important discussion topic
  • "who in your 'gang' or group exert the most influence on others in terms of music?" -- 3 self-identified. Characteristics: highly invested in music, good comm, willing to share info. People who are opinion leaders want to stay opinion leaders, will invest heavily in effort
  • in other domains work shows that weak ties are more important then strong ties in finding new information works almost all the time -- for 2 participants weak ties important -- for others strong ties with significantly different social network are important -- music as vehicle for social interaction
  • strong ties have different roles -- not important for discover, but critical for legitimization of musical taste
  • similar and reliable social connections are critical
  • social network maps (pic forthcoming...)
  • unknown how common these results are (same survey) as yet unknown exact implications for recommenders
  • Q (unknown)- Weak ties v. Strong ties -- how do you define the difference?: not really about newness, but it's entirely possible with new detail
  • Q (claudio) - What kinds of systems are implied with this work? Not necessarily a different system for adolescents. tight connectivity is critical, perhaps the difference is that strong ties may become more critical
  • (claudio) - does the notion that music describes you change as you get older?: not really actually, adolescent are interested in individual uniqueness
  • Q (Mark L) are social networks online somewhat different?: yes and no. in facebook you can find relatives, but noise is a big problem. But trust is not known
  • I asked about using graph difference. Answer could work, also other automatic methods...
lunch now. I'll make a new post for the afternoon session.

updated again (5:14pm) Eugenio Tacchini is Italian not Finnish (oops)

4 comments:

zazi said...

Are the slides of the keynote speech somewhere available?

ben said...

Dunno, but I'll ask the the speaker and report back...

ben said...

zazi - link to the slides is now in the post!

zazi said...

thanks mate ;)