Showing posts with label social networking tools. Show all posts
Showing posts with label social networking tools. Show all posts

Monday, 1 March 2010

IEEE-THEMES --shameless self promotion--

I'm going to be presenting work at IEEE-THEMES, a workshop collocated with ICASSP, on March 15th in Dallas, TX. The talk is associated with an article to be published in the august issue of Select Topics in Signal Processing, which is a special issue on signal processing and social networks. Here's the title/abstract (note: link is to a preprint, camera-ready isn't due till after the talk so paper may well change a touch...) :


Abstract—This paper presents an extensive analysis of a sample of a social network of musicians. The network sample is first analyzed using standard complex network techniques to verify that it has similar properties to other web-derived complex networks. Content-based pairwise dissimilarity values between the musical data associated with the network sample are computed, and the relation- ship between those content-based distances and distances from network theory explored. Following this exploration, hybrid graphs and distance measures are constructed, and used to examine the community structure of the artist network. Finally, results of these investigations are presented and considered in the light of recommendation and discovery applications with these hybrid measures as their basis.
The paper mostly covers content that has been discussed elsewhere (much of it with Kurt Jacobson) refactored for a broader audience and with wider narratives in mind. That said there are some notable new findings in the paper as well. We have run another acoustic dissimilarity measure across the entire set (the 2009 MIREX entry in audio music similarity using marsyas) which for the most part confirms our earlier findings (that acoustic similarity and social similarity [mostly] aren't linearly correlated and that community genre labeling becomes more homogeneous [again, mostly] when using the audio sim as a weight). Additionally, we have broadened our comparison metrics to include an examination of the mutual information between the different dissimilarity sets. This also basically confirms our earlier findings, though mutual information provides a very satisfying level of nuance that is not possible from simply testing (using Pearsons) for linear correlation, especially given that our data is quite far from a normal distribution. So, if you're planning to be at ICASSP, I'd highly recommend IEEE-THEMES (the rest of the program looks to be very interesting as well...) and if you aren't going to be in Dallas, there are a few options for you.
  1. If you're in London right now, you can come to Goldsmiths today at 4pm to rm 144 in the main building, where I'll be giving a trail run of the talk.
  2. Slides (and perhaps some video) will be made available at some point (probably just after the talk is given).
  3. IEEE is running a pay-to-watch live stream of THEMES, so there's that as well.
Generally, if you're going to be in Dallas fr0m March 15-19, much discussion can happen in person. Also, between now and then I'll be doing some traveling (tomorrow till 6 March I'll be at UIUC, then from there till the 14th of March I'll be in San Diego) so if any readers are interested in some in person discussion and our locations overlap, let me know and perhaps something can be arranged.

Sunday, 12 October 2008

mypyspace status update

So MyPySpace has been getting a facelift.  Kurt with some input from users (we apparently have users!  Who knew.) has been refactoring the rdf translators and fine tuning the myspace ontology as well.  Most (all?) of these changes are also being reflected into the live service.  While all of this has been going on, I've refactored the page scraping, crawling and downloading into a much more sensible class architecture from it former stream of consciousness implementation (I believe the polite description is 'research code').  It still has quite a ways to go (alpha!) but it's starting to resemble an actual library.  If you want to play with my refactored bits you can check them out like this:

> svn co https://mypyspace.svn.sourceforge.net/svnroot/mypyspace/myspaceCrawler/trunk/ myspaceCrawler

then you can do nifty things like the following (inside your favorite python interpreter or script, I'm using ipython here):

In [1]: import mpsUser

In [2]: gearmonkey = mpsUser.mpsUser('http://www.myspace.com/gearmonkey')

You simply give the class a valid myspace user url to initialize it (this is my artist page.  If you want to play with this, don't feel the need to listen to my music...)

In [3]: gearmonkey.isArtist
Out[3]: True


In [4]: gearmonkey.downloadTracks('~/Music/mpsUsertest/gearmonkey/')
INFO:root:No ID3 header found for /Users/bfields/Music/mpsUsertest/gearmonkey/1_Cheeky.mp3; creating tag from scratch
INFO:root:No ID3 header found for /Users/bfields/Music/mpsUsertest/gearmonkey/2_TrainTune.mp3; creating tag from scratch
INFO:root:No ID3 header found for /Users/bfields/Music/mpsUsertest/gearmonkey/3_Give Way.mp3; creating tag from scratch
INFO:root:No ID3 header found for /Users/bfields/Music/mpsUsertest/gearmonkey/4_En La Selva Mvt II GMO vip.mp3; creating tag from scratch
Out[4]: (4, 4)

Then you can find out if your user is an artist by checking the boolean isArtist.  If it's an Artist, you can download their songs.  That return value is a tuple of (songs successfully downloaded, downloads attempted). 

In [5]: gearmonkey.songs[0].title
Out[5]: u'Cheeky'

As part of the download process, each song is an instance of the class mpsSong (more on that class in a bit).

You can use the mpsUser class to crawl the artist network like this:

In [6]: artistFriends = gearmonkey.findArtistTopFriends()

In [7]: artistFriends
Out[7]: 
[mpsUser.mpsUser instance at 0x1a3a7b0,
 mpsUser.mpsUser instance at 0x1c5d788,
 mpsUser.mpsUser instance at 0x1a3ad50,
 mpsUser.mpsUser instance at 0x1a3a968,
 mpsUser.mpsUser instance at 0x1c7fc88]

In [8]: artistFriends[0].artist
Out[8]: u'Mike'

In [9]: for entry in artistFriends:
   ...:     print entry.artist
   ...:    
Mike
Otto Von Schirach
GEIN
The Dead Hookers' Bridge Club
EVOL


In [10]: artistFriends[2].downloadTracks('~/Music/mpsUsertest/GEIN/')
INFO:root:No ID3 header found for /Users/bfields/Music/mpsUsertest/GEIN/1_Life Of Sin GEIN edit.mp3; creating tag from scratch
INFO:root:No ID3 header found for /Users/bfields/Music/mpsUsertest/GEIN/2_Deadly Algorhythm GEIN Remix.mp3; creating tag from scratch
INFO:root:No ID3 header found for /Users/bfields/Music/mpsUsertest/GEIN/3_GEIN KJ Sawka Break the Enemy.mp3; creating tag from scratch
INFO:root:No ID3 header found for /Users/bfields/Music/mpsUsertest/GEIN/4_GEIN  Warden.mp3; creating tag from scratch
INFO:root:No ID3 header found for /Users/bfields/Music/mpsUsertest/GEIN/5_GEIN vsThe ChosenAbomination.mp3; creating tag from scratch
INFO:root:No ID3 header found for /Users/bfields/Music/mpsUsertest/GEIN/6_GEIN  Hell Audio rmx.mp3; creating tag from scratch
Out[10]: (6, 6)

In [11]: artistFriends[2].topFriends
Out[11]: 
[u'11187934',
 u'2123795',
 u'2177245',
 u'706581',
 u'20492111',
 u'66601290',
 u'5017015',
 u'207669100',
 u'52365642',
 u'2186134',
 u'3378431',
 u'55609497',
 u'30244',
 u'26629700',
 u'80613962',
 u'74772580',
 u'28841051',
 u'317327']

In [12]: geinArtistFriends = artistFriends[2].findArtistTopFriends()

In [13]: geinArtistFriends
Out[13]: 
[mpsUser.mpsUser instance at 0x1d83a30,
 mpsUser.mpsUser instance at 0x1d95710,
 mpsUser.mpsUser instance at 0x1d99198,
 mpsUser.mpsUser instance at 0x1d95be8,
 mpsUser.mpsUser instance at 0x1d956c0,
 mpsUser.mpsUser instance at 0x1e6fa80,
 mpsUser.mpsUser instance at 0x1e81b98,
 mpsUser.mpsUser instance at 0x1da7080,
 mpsUser.mpsUser instance at 0x1e81b70,
 mpsUser.mpsUser instance at 0x1f2dee0]

In [14]: for friend in geinArtistFriends:
   ....:     print friend.artist
   ....:    
EVOL
GUERILLA®
THE GUN
Habit Recordings
Mumblz / Delusional
Tech Itch
Lost Soul Recordings
None
Donny
NECRO THE SEXORCIST SPECIAL EDITION CD/DVD SOON!!!


and so on and so forth.  Once you've initialized the songs for an artist you can use the mpsSong class structure to find things out about the songs as well:

In [15]: gearmonkey.songs
Out[16]: 
[mpsUser.mpsSong instance at 0x1a155d0,
 mpsUser.mpsSong instance at 0x1a2ff08,
 mpsUser.mpsSong instance at 0x1a338f0,
 mpsUser.mpsSong instance at 0x1a338c8]

In [17]: for song in gearmonkey.songs:
   ....:     print song.title + " by " + song.parent.artist + " has been played " + song.playcount + " times." 
   ....:    
Cheeky by G_M_O has been played 117 times.
TrainTune by G_M_O has been played 168 times.
Give Way by G_M_O has been played 88 times.
En La Selva Mvt II GMO vip by G_M_O has been played 9 times.

In [18]: 


There are also some simple hooks to call fftExtract on the songs of an artist but I'll save those bits for another post.   One quick note, I don't believe we've fixed the bug that prevents song downloads in the US (and maybe Canada), but the url requests have been changed slightly so if anyone tries it over there let me know.  All the scraping should be fine in the States and everything should work everywhere else.  Also, you need the mutagen ID3 tag library installed prior to using this.  

If any readers do give this a try let me know if you have any thoughts (especially interface related) down below.

Wednesday, 28 November 2007

MyPySpace update

I have finally set down an intial release of the musicGrabber tool, which is being developed by myself and kurtjx over on sourceforge. You can now grab the python tool without getting your hand dirty in our subversion repository. Go take a look