You are browsing the archive for Oxford Open Science.

Content Mining: Scholarly Data Liberation Workshop

- December 14, 2013 in events, Oxford Open Science, Research, Tools

The November Oxford Open Science meeting brought over 20 researchers together for a ‘Content Mining: Scholarly Data Liberation Workshop’.

Iain Emsley and Peter Murray-Rust kicked off proceedings by presenting their work on mining Twitter and academic papers in chemistry and phylogenetics respectively.

Next we tried out web-based tools such as Tabula for extracting tables from PDF (we were fortunate enough to have Manuel Aristarán of Tabula joining us remotely via Skype) and ChemicalTagger for tagging and parsing experimental sections in chemistry articles.

OOS

We then got down to business with some hands-on extraction of species from HTML papers and mentions of books on Twitter using regular expressions. All code is open source so you are welcome and encouraged to play, fork and reuse!

Peter’s tutorial and code to extract species from papers can be found on bitbucket and the relevant software and command line tools have helpfully been bundled into a downloadable package. Iain has also documented his flask application for Twitter mining on github so have a go!

If this has whet your appetite for finding out more about content mining for your research and you’d like to ask for input or help or simply follow ongoing discussion then join our

open content mining mailing list


Some furry friends joined in the efforts - met Chuff the OKF Okapi and AMI the kangaroo

Some furry friends joined in the efforts – met Chuff the OKF Okapi and AMI the kangaroo