content-based stuff now:
Dmitry Bogdanov, Martín Haro, Ferdinand Fuhrmann, Emilia Gómez and Perfecto Herrera
- Sim is not rec. need similarity
- can we improve content based rec by merging pref data?
- gmm + pref model
- ask user for small set of tracks that specify the user's preference by example
- get bag of frames on these
- SVMs to get sematics (probablistic)
- in this semantic space, search for tracks
- can search in a variety of ways (use of Pearson's correlation is taken from prev work)
- for eval compare our method to a bunch of existing methods, content-based , contextual, random
- some users did a test get pref set (varies form 19 to 178 tracks for a user) this takes a long time
- get lots of tracks from all the methods, shuffle, stick in front of user ask lots of Qs per track
- created three categories based on the evals: Hits, trusts, fails
- Hits -user likes, is new
- trusts - user likes, is not new
- fail - no to all
- unclear - the rest (18%)
- A good system should provide many hits and some trusts avoiding fails
- in the results, last.fm (via api) is very good for hits and trusts
- everyone else was bad at trusts
- the new method was best for non-last.fm with hits, but last.fm is different drawing set of music so they're better
- proposed semantics offer an improvement over pure timbral features
- but still inferior to industrial approaches, though this proposed work improves considerably, a good way to cold start perhaps
Q (oscar) I dont' understand the last.fm? why didn't you use for sim?
we tried, couldn't get enough info
(oscar follow up) low trust on the content, do you think it's tied to a lack of transparency?
maybe, but our definition of trust just meant user likes and knows.
Q() was the SEM-ALL about finding songs that are close to any or all?
Pedro Mercado and Hanna Lukashevich
Hannah is presenting
- clustering can help you swim in the sea of data
- users can fix incorrect clusters, positive feedback
- system diagram:
- similarity can be given considered as a graph, then you can do random walks, calc eigen values etc.
- but, what if this user doesn't care about somethings? User pref based feature selection.
- in the given space, you can then find distance (paper uses Pearson's but other dist could be used)
- contraint the space (tricky math, see paper...)
- eval: used the MIREX 04content description data
- constraints from genre labels
- using test train as an example: what's in contraint space, what isn't
- mutual information, something else I didn't catch
- some graphs showing that there's more awesome with presented method
- when looking at outliers, things are less clear but still seem positive
- [graphs are page 6 of the pdf, have a look for details]
- to wrap up: ML approaches can improve recs at least with our simulated user...
- our clustering methods are speedy, though scale is tricky but since our matrix sparse should be doable
- Way better than random constraints
- future work: stick constraints in feature selector, we did this, to appear in ICML, gives significant imporvement, but causes some trouble, read paper for detail [excellent ICML tease...]
-- coffee and demos now...