Tuesday 5 April 2011

Free Beer: A Plea for Open Data [About Beer]

(I wrote most of this right after my viva, but got a bit sidetracked...)
Hey look, my first blog post about beer (or at least beer metadata).

So yesterday a few weeks ago, Tim Cowlishaw (@mistertim ) stated this on twitter:
To which I replied with this :


(Scraping is a way to get the info a human reads, say on a website into a format a computer program can read. More on why I'd want to do that in a minute...)
Which was followed by what I thought was a reasonable request:

Now at this point I had figured that was the end of it. Both rate beer or beer advocate ignored my requests for data a year or so ago, I was expecting the same this time. However, beer advocate responded via twitter:

(Note that the link to the tweet no longer resolves, because beeradvocate decided to delete this tweet a couple hours later. The screen capture was taken from my twitter client just after the deletion...)
Now, I hadn't been prepared for such knee-jerk nastiness regarding seemingly reasonable data requests and neither had Tim as he quickly push out this series of messages:

While I and others pushed some similar responses, Tim's summarize things really well: boo, disdain, technical critique. (after this both Tim and I appeared to be blocked from following beeradvocate...)

The crux of all this is that I (and it would appear others, but from here I speak only for myself) would love to have access to structured data (as a service or, better yet, as documents) about beer and the people who drink it.

I'd love to build browser-based applications that do cross-domain recommendation of say beer and music. But in order to do that I'd need data about people's taste, in beer and music. Lots of options to work with in the music domain. But beer? Machine readable beer data is harder to find.

Both ratebeer and beer advocate have a great deal of this data, it's just not (openly) machine readable. In ratebeer's case this is entirely crowdsourced and for beer advocate this is true for their community pages. There's a compelling case that crowdsourced data should be as open as possible, given that the data itself comes from the public at large. But beyond the moral case, opening your data means that the wide-world of evening and weekend software developers/architects/designers/whatevers (many have the same job during the day) will expand what is possible a site's data in a way that will benefit said site (like my half baked idea above). This, in essence, is the commercial argument for supporting open data and has been shown to be extremely effective in other domains (say, to pick one at random, music). And there is a simply massive spread of open data apis (again, both service and document) but barely any covering data about my favourite topic that isn't music, beer. So what do you say ratebeer or beeradvocate? How about some nice strucutured data?

note: I should mention that there are a couple sites that are beer related and open: untappd and beerspotr. Both are good sites, though neither is quite to the point of hitting critical mass in terms of data coverage and usefulness just yet. Either might at some point in the future, but ratebeer and beeradvocate already have, the data just isn't accessable.

1 comment:

Tim said...

Totally agree with all that, Ben. The main points that bug me are:

1) A moral problem with sites that gain their value from data contributed by a community of users not playing nice and allowing that same community access to the data

2) The total shortsightedness of it in economic terms. The only way in which someone could damage a data provider's business using their own data is to do their job *better* than them - and the best way to stop that happening is to make sure you're consitently offering an excellent product. Anything else a third party could do with the data actually adds value for the business supplying the data (assuming attribution, which could very easily be enforced through CC / Science Commons licences or similar).

Beer's such an interesting domain for collaborative filtering / information retrieval work - There's plenty of really interesting subjective data about taste through reviews, ratings etc, formal beer style taxonomies like BJCP, not to mention all the interesting chemical / biological information that you can glean from a recipe and a little technical information about ingredients and process. There's also a ton of interesting questions that could be asked about Beer as a domain, all we need is the data!