You are browsing the archive for Panton Principles.

Recruiting Scientists

- June 10, 2014 in Panton Fellowships, Panton Principles

Working out where we should install our sensors

Working out where we should install our sensors

Anyone whose been following the progress of my fellowship through my blog posts will know that I have been working towards getting sensors into schools for a while now. Well a couple of weeks ago I finally ran an introductory session with some primary school pupils (aged 8-11) at Kibworth CE Primary school in Leicestershire.

I had been developing the introductory material for a few weeks prior to the lesson with some help from the teachers at Kibworth who have been really responsive and open to my ideas. We decided we wanted this activity to be very student led so that they actually planned much of the experiment themselves to encourage them to think about why we were doing this in more depth. We titled the introductory session “What’s in the air you breathe?”.

Snapshots of the introductory presentation "What's in the air you breathe"

Snapshots of the introductory presentation “What’s in the air you breathe”

I started the session by introducing the topic of air quality to the students, from the very basic first discussions of what makes up the air to talking about emission sources and health effects of air pollution. The introduction lasted less than 20 minutes and I encouraged lots of discussion with the students, asking them specific questions to work out what knowledge they had and to allow them to teach one another. The response to this was great and I was impressed by how much they knew about the atmosphere, one student explained the greenhouse effect to us and another mentioned the ozone hole. I hadn’t expected them to know so much about the topics we were discussing and so I was really pleased when I started talking to them.We then showed the students the equipment that they would have in school and explained what everything did. It was then over to the students to work out in groups where they wanted to install all of the sensors. To make this decision I asked them to think about where they thought the sources of air pollution around the school would be and where there are people who would be breathing it in. They quickly identified that the highest levels of pollution were likely to be in the car park, near the road and at the bottom of the playground which was relatively close to a train line. They also told me that in the morning and afternnon lots of people would be walking through the car park and at lunchtimes the students would all be in the playground. At this point one of the fundamental hurdles of being a field work scientist had to also be explained to the students- some of the sensors need mains power and so although the school gates may have been a good position in terms of producing interesting data, logistically it wasn’t possible to power the sensor that far from the school building.After lots of enthusiastic discussion and some expectation management they decided that they would like to put the sensor in three positions and so the pupils planned to move it around the school during the term. These were:

  1. In the playground near to the car park and the chicken coop- they wanted to see what levels of pollution the chickens were being exposed to as well as themselves during playtime.
  2. At the bottom of the playground near to the train tracks.
  3. In the main playground where most of the students played at lunchtimes.
Lots of enthusiastic ideas...

Lots of enthusiastic ideas…

The sensors are now with the school waiting to be installed in the next few weeks at which point data will start streaming in. While the students are busy being the scientists I need to get on with planning a data analysis session that we can run before the summer holidays. Overall I’m really pleased with how the session went and look forward to going back into the school soon.

 

Panton Fellow Update: Introduction to Open Research Data

- May 5, 2014 in Panton Fellowships, Panton Principles

In my first three-month update report report I discussed the book I’m working on as the major output of my Panton Fellowship. Entitled Introduction to Open Research Data, the book explores both the practical and theoretical issues associated with Open Data from a range of general and disciplinary viewpoints. The book will be Open Access, available in various ebook formats and low-cost print editions, and remixing will be encouraged – particularly the subject-specific guidance, which disciplinary communities can build upon as a foundation for a collection of resources on Open Data.

Whilst I am still awaiting a couple of contributions, I am happy to be able to share a provisional table of contents for the book. (Chapter topics on the left and authors on the right . Chapter titles still TBD):

1. Foreword: Introduction to the Panton Fellowships
2. Introduction to the book and the Panton Principles – Sam Moore (with input from the original Panton group)
3. Open Content Mining – Peter Murray-Rust and Jenny Molloy
4. Open Data and Neoliberalism – Eric Kansa
5. Data Sharing in a Humanitarian Organization: The Experience of Médecins Sans Frontières – Unni Karunakara (previous published in PLOS Medicine)
6. Open Data in Earth/Climate Sciences – Sarah Callaghan
7. Open Data in Psychology – Wouter van den Bos, Mirjam Jenny and Dirk Wulff
8. Digital Humanities and Linked Open Data  – Jodi Schneider
9. Open Data in Palaeontology – Ross Mounce
10. Open Data in the Health Sciences  – Tom Pollard
11. Open Data in Economics – Velichka Dimitrova
12. Why Open Drug Discovery Needs Four Simple Rules for Licensing Data and Models – Antony J. Williams, John Wilbanks and Sean Ekins (previously published in PLOS Computational Biology)

I won’t go into more detail about the content of each chapter, though authors were given free rein to approach the subject however they saw fit. Furthermore, I sought permission from the authors of the previously published pieces, though they were originally published under CC BY, and all were happy for their contributions to appear in the book.

I’m super excited for how this is coming together and I hope to have the book published by August. I will of course be posting updates along the way. Get in touch if you have any questions!

A live AQ data feed- finally!

- February 19, 2014 in Panton Fellowships, Panton Principles

As anyone who has ever done lab work will know, it always takes longer than you expect! Well that’s definitely the case with my sensor calibration experiments. We have got there eventually though and the calibration is happening this week. So while all of the delays were happening there I decided to get a webpage sorted that I can use as a live data feed for the sensors and also somewhere to download the data. Version 1 of my webpage can be found here. It definitely needs a bit more work but it currently shows data from the last three days and will soon have a way of downloading the data directly.

Whilst we’re in the calibration stage the data might look a little strange but I’ll be putting updates on the webpage regularly and will blog when the sensor is installed in the school and is collecting data. In the next few weeks I’m planning to visit the school that I am working with to decide on a deployment location with the pupils. Both the school and I want the pupils involved in the science as much as possible and so they will be helping me to pick the best location for the sensor, to install the sensor, to take measurements and then to analyse it. We’re hoping that this level of involvement will not only help to keep the pupils engaged but will also teach them what it’s like to be real scientists.

The second facet to my work is the general public engagement aspect. I’m hoping to engage with members of the public who live or work close to the monitoring site to make them aware of the air that they breathe. This will probably start with the parents of the pupils involved in the project but will hopefully expand from there.

I’ve definitely reached an exciting point in my project now so watch out for updates…

An Update on my Panton Fellowship

- January 8, 2014 in Panton Fellowships, Panton Principles

So as month four of my Fellowship begins it’s time to recap and reflect on what I’ve done so far and what’s left for me to still do…

Over the last three months I’ve met and spoken to lots of interesting people, the world of open science/open data is very new to me and so making these contacts has been invaluable.

So what else have I managed to achieve? A lot of the first few months was spent sourcing the right sensors for this project and then getiting them to work. As of the week before Christmas I have a working sensor which now needs calibrating and then it’s time for it to be deployed (yippee!).  I’ve been working with an MChem student and other colleagues on a calibration plan which can be used, not only for my sensors, but for the large selection of different ones we are now building up. We’re planning to run calibrations this month and then install the sensor in the first school in February.

As I’ve mentioned before, the sensors final destination will be at a school in Leicester and so I have also been in contact with potential schools and have had a great response. The first school I’ll be working with is based just outside of Leicester and they are as excited as me about this project. We’re planning some introductory sessions for the school, outlining the project to pupils and then some data analysis sessions every term to look at the data with pupils and get them really thinking about what they are measuring. Not only will this be a great way of teaching them about air quality issues but will also reinforce certain areas of the curriculum too.

Alongside of this I have been involved in the development of some “homemade” air quality sensors which we are hoping to deploy in Leicester this year.  This design is looking to be far cheaper than any currently on the market and the first prototype will be ready for testing next week.

So it’s been a busy few months I’ve passed my PhD viva, started a new job and my Panton Fellowship but it’s been great and I’m really looking forward to see what the next three will have in store.

My previous blog posts can also been found on the links below:

http://science.okfn.org/2013/10/03/a-quick-hello-from-a-panton-fellow/

http://science.okfn.org/2013/11/01/my-first-month-as-a-panton-fellow/

http://science.okfn.org/2013/12/11/citizen-science-project-for-air-quality-measurements/

 

Panton Fellow Update: Samuel Moore

- January 8, 2014 in Panton Fellowships, Panton Principles, Publications

My first few months as a Panton Fellow have flown by and so I wanted to provide a quick update on the work I’ve been doing. Whilst it’s not possible to discuss everything, I thought it would be good to list some of the larger projects I’ve been working on.

Early into the fellowship I made contact with two of the Open Economics Working Group coordinators, Velichka Dimitrova and Sander Van Der Waal, to discuss how best to encourage Open Data in economics. Whilst we thought that a data journal could be a good way of incentivising data sharing, we also thought it would be sensible to conduct a survey of economists and their data sharing habits to see if our assumptions were correct. This will give us some firm evidence of the best way to advocate for Open Data in economics. The results will be released when they are available.

Staying within the OKFN framework, I also helped kick-start the Open Humanities Group back into action in a meeting with the organisers and a post to the discussion list (posing the question: What does Open Humanities research data mean to you?). As a humanities researcher myself I am very keen to see the humanities embrace a more open approach to scholarship and it’s great to see a resurgence of activity here. So far this has resulted in a forthcoming Open Literature Sprint on January 25th in London. This sprint will build upon some of the work already completed on the Open Literature and Textus projects for collaborating, analysing and sharing open access and public domain works of literature and philosophy. Whilst I cannot take any credit for organising the event, I will certainly be in attendance and I encourage all those interested in Open Humanities research/data to attend too. We are looking for coders, editors and textfinders for the event – absolutely no technical skills required! You can sign up to attend here.

However, the majority of my time has been spent working on a book: An Introduction to Open Research Data. This edited volume will feature chapters by Open Data experts in a range of academic disciplines, covering practical information on licensing, ethics, and information for data curators, alongside more theoretical issues surrounding the adoption of Open Data. As the book will be Open Access, each chapter will be able to standalone from the main volume so communities can host, distribute and remix the content that is relevant to them (the book will also be available in print). The table of contents is near enough finalised and the contributions are currently being written. I’m hoping the volume will be ready by August but watch this space! Do get in touch if you’ve any questions at all.

In addition, here is a round-up of the blogposts I’ve written so far:

On the Harvard Dataverse Network Project – an open-source tool for data sharing

What are the incentives for data sharing?

Panton Fellow Introduction: Samuel Moore

 

Citizen Science Project for Air Quality Measurements

- December 11, 2013 in External Meetings, Panton Fellowships, Panton Principles

photo

Chemistry themed lunch!

I have spent the last two days at a meeting run by the Automation and Analytical Management Group (AAMG) of the Royal Society of Chemistry.  As well as being a lovely meeting location (the RSC building Burlington house isn’t your average conference centre- dessert was served in beakers and the rooms are beautiful) the meeting itself has been very interesting.

With topics of talks ranging from new air quality monitoring techniques to the latest deployment of networks of sensors to exciting new citizen science projects and the future of air quality monitoring.

The iSPEX add-on being used to measure aerosol properties

The iSPEX add-on being used to measure aerosol properties

It was this final topic that really caught my attention, a project called iSPEX originating in the Netherlands.  iSPEX is an add-on for your iPhone which allows the user to take measurements of the properties of aerosols. This project is currently being piloted in the Netherlands and has had some great success. On the first national iSPEX measurement day more than 5000 measurements were collected all over the Netherlands. This brilliant response shows the interest that can be generated by citizen science air quality projects.

I personally cannot wait for this project to be extended to other countries as well because I think it’s gadgets like this that really will start to make some headway towards increasing public interest in Air Quality.

Other projects discussed included installing a large network of low-cost air quality sensors at Heathrow airport and another project from Leicester where air quality outreach is also being pushed through funding from the RSC. Overall a very positive meeting demonstrating the interest in networks of monitors and citizen science concepts.

On the Harvard Dataverse Network Project (and why it’s awesome)

- December 10, 2013 in Panton Fellowships, Panton Principles, Tools

I am a huge fan of grass-roots approaches to scholarly openness. Successful community-led initiatives tend to speak directly to that community’s need and can grow by attracting interest from members on the fringes (just look at the success of the arXiv, for example). But these kinds of projects tend to be smaller scale and can be difficult to sustain, especially without any institutional backing or technical support.

This is why the Harvard Dataverse Network is so great: it facilitates research data sharing through a sustainable, scalable, open-source platform maintained by the Institute for Quantitative Social Sciences at Harvard. This means it is sustainable through institutional backing, but also empowers individual communities to manage their own research data.

In essence, a Dataverse is simply a data repository, but one that is both free to use and fully customisable according to a community’s need. In the project’s own words:

 

‘A Dataverse is a container for research data studies, customized and managed by its owner. A study is a container for a research data set. It includes cataloging information, data files and complementary files.’

(http://thedata.harvard.edu/dvn/)

 

There are a number of ways in which the Dataverse Network can be used to enable Open Data.

Journals

A Dataverse can be a great way of incentivising data deposition among journal authors, especially when coupled with journal policies of mandating Open Data for all published articles. Here, a journal’s editor or editorial team would maintain the Dataverse itself, including its look and feel, which would instil confidence in authors that the data is in trusted hands. In fact, for journals housed on Open Journal Systems, there will soon be a plugin launched that directly links the article submission form with the journal’s Dataverse. And so, from an author’s perspective, the deposition of data will be as seamless as submitting a supporting information file. This presentation [pdf] goes into the plugin in more detail (and provides more info on the Dataverse project itself).

(Sub-)Disciplines

There are some disciplines that simply do not have their own subject-specific repository and so a Dataverse would be great for formalising and incentivising Open Data here. In many communities, datasets are uploaded to general repositories (Figshare, for example) that may not be tailored to their needs. Although this isn’t a problem – it’s great that general repositories exist – a discipline-maintained repository would automatically confer a level of reputation sufficient to encourage others to use it. What’s more, different communities have different preservation/metadata needs that general repositories might not be able to offer, and so the Dataverse could be tailored exactly to that community’s need.

Individuals

Interestingly, individuals can have their own Dataverses for housing all their shared research data. This could be a great way of allowing researchers to showcase their openly available datasets (and perhaps research articles too) in self-contained collections. The Dataverse could be linked to directly from a CV or institutional homepage, offering a kind of advertisment for how open a scholar one is. Furthermore, users can search across all Dataverses for specific keywords, subject areas, and so on, so there is no danger of being siloed off from the broader community.

So the Dataverse Network is a fantastic project for placing the future of Open Data in the hands of researchers and it would be great to see it adopted by scholarly communities throughout the world.

 

What are the incentives for data sharing?

- November 5, 2013 in Panton Fellowships, Panton Principles

I have argued elsewhere that researchers should embrace scholarly openness because of the disciplinary benefits it affords. Specifically, and as is widely argued, Open Data ensures that research can be verified through replication and reused to pose and help answer new questions. Furthermore, in the humanities, Open Data can also contribute to the cultural commons, especially through initiatives such as the DPLA and Europeana. Open Data thus helps research move to more of an economy of sharing, rather than one of mere competition.

But the truth is that academia can be a ruthless area to work in and holding onto data is one way that researchers in some disciplines try to maintain a competitive advantage over their peers. For example, I recently spoke with a public health researcher who told me that she wouldn’t share any of her data until she had completely exhausted its potential for publications, which could take years. After that, she admitted she would have probably moved on to other things and the data would be forgotten about. Whilst this anecdote reflects the practices of only one researcher, I suspect that it reflects common practice for many researchers.

Data sharing therefore needs incentives, tangible rewards for individuals that work within the current system to encourage researchers to open up their data for the wider community. Of course, mandates are important too, although they can be a blunt instrument without broad community support. What, therefore, is the best way to reward data deposition and build community momentum behind Open Data? Three ways spring to mind:

Data citation

The most obvious way to incentivise Open Data is to ensure that data creators are formally credited for their contribution through the use of citations. Adopting a standardised mechanism for citing data will recognise/reward data creators and help track the impact of individual datasets. DataCite suggests the following structure for citing a dataset:

Creator (PublicationYear): Title. Version. Publisher. ResourceType. Identifier

Source: http://www.datacite.org/whycitedata

Nevertheless, data citation is a new and undeveloped concept, and the practicalities are still to be fully worked out. The following report by the CODATA-ICSTI Task Group on Data Citation Standards and Practices goes into more detail on these issues: ‘Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data’.

New collaborations

Data sharing can of course lead to new collaborations with other researchers, either those looking to build upon pre-existing datasets or to group together to collect new data. In many ways, data sharing is an advertisement for the kind of work a researcher is doing – not just the subject expertise, but methodological expertise too – and is a statement that one is open to sharing/collaboration. This approach is particularly prevalent in the digital humanities, which is often seen to set itself apart for its collaborative approach to scholarship (see Digital Humanities Questions & Answers for an example of this collaborative approach). As the field is in its relative infancy, many digital humanists are self-taught according to their individual needs and so there isn’t a methodological canon that researchers are taught, which makes collaborating and sharing skillsets an attractive prospect.

Perception of rigour    

As Wicherts et al. demonstrated, there is a correlation between a willingness to share data and the quality of statistical reporting in psychology. Although this is only a correlation, the argument here is that researchers may take more care over the quality and presentation of their data when they have committed to sharing it, and so researchers who routinely share data can build up a reputation for scholarly rigour. Obviously this incentive is less tangible than the previous two, but it is still worth mentioning that Open Data, and openness in general, can contribute to the overall positive reputation of a researcher.

These appear to me to be the immediately obvious incentives for the average researcher to share their data, and as a Panton Fellow I’m looking to explore these further this year. I would be interested to read any I’ve missed!

 

“It’s not only peer-reviewed, it’s reproducible!”

- October 18, 2013 in Panton Fellowships, Panton Principles, Reproducibility

Peer review is one of the oldest and most respected instruments of quality control in science and research. Peer review means that a paper is evaluated by a number of experts on the topic of the article (the peers). The criteria may vary, but most of the time they include methodological and technical soundness, scientific relevance, and presentation.

“Peer-reviewed” is a widely accepted sign of quality of a scientific paper. Peer review has its problems, but you won’t find many researchers that favour a non peer-reviewed paper over a peer-reviewed one. As a result, if you want your paper to be scientifically acknowledged, you most likely have to submit it to a peer-reviewed journal.

Even though it will take more time and effort to get it published than in a non peer-reviewed publication outlet.

Peer review helps to weed out bad science and pseudo-science, but it also has serious limitations. One of these limitations is that the primary data and other supplementary material such as documentation source code are usually not available. The results of the paper are thus not reproducible. When I review such a paper, I usually have to trust the authors on a number of issues: that they have described the process of achieving the results as accurate as possible, that they have not left out any crucial pre-processing steps and so on. When I suspect a certain bias in a survey for example, I can only note that in the review, but I cannot test for that bias in the data myself. When the results of an experiment seem to be too good to be true, I cannot inspect the data pre-processing to see if the authors left out any important steps.

As a result, later efforts in reproducing research results can lead to devastating outcomes. Wang et al. (2010) for example found that they could not reproduce almost all of the literature on a certain topic in computer science.

“Reproducible”: a new quality criterion

Needless to say this is not a very desirable state. Therefore, I argue that we should start promoting a new quality criterion: “reproducible”. Reproducible means that the results achieved in the paper can be reproduced by anyone because all of the necessary supplementary resources have been openly provided along with the paper.

It is easy to see why a peer-reviewed and reproducible paper is of higher quality than just a peer-reviewed one. You do not have to take the researchers’ word of how they calculated their results – you can reconstruct them yourself. As a welcome side-effect, this would make more datasets and source code openly available. Thus, we could start building on each others’ work and aggregate data from different sources to gain new insights.

In my opinion, reproducible papers could be published alongside non-reproducible papers, just like peer-reviewed articles are usually published alongside editorials, letters, and other non peer-reviewed content. I would think, however, that over time, reproducible would become the overall quality standard of choice – just like peer-reviewed is the preferred standard right now. To help this process, journals and conferences could designate a certain share of their space to reproducible papers. I would imagine that they would not have to do that for too long though. Researchers will aim for a higher quality standard, even if it takes more time and effort.

I do not claim that reproducibility solves all of the problems that we see in science and research right now. For example, it will still be possible to manipulate the data to a certain degree. I do, however, believe that reproducibility as an additional quality criterion would be an important step for open and reproducible science and research.

So that you can say to your colleague one day: “Let’s go with the method described in this paper. It’s not only peer-reviewed, it’s reproducible!”

Panton Fellow Introduction: Samuel Moore

- October 3, 2013 in Panton Fellowships, Panton Principles

Hellphotoo! My name is Samuel Moore and I am delighted to have been selected for a Panton Fellowship this year. I wanted to write a quick post to introduce myself, my background and my plans for the fellowship.

I am a second-year, part-time PhD student at King’s College London working in the Centre for e-Research, which is part of the King’s Digital Humanities department. Even though the Panton Fellowships have generally been aimed at scientists, I am actually a humanities researcher by training, having studied philosophy as an undergraduate and literature as a master’s student. My PhD research straddles the border between the humanities and social sciences – I am conducting a series of multi-year case studies to assess the extent to which Open Access publishing is changing research practices in a number of humanities subjects. Overall, my academic interests centre on the ways in which ‘openness’ can unlock new methods of scholarly research and communication, and I feel that open data is one key piece in this puzzle.

With this in mind, one of my aims for the year is to advocate for, and build momentum behind, Open Data in the humanities and social sciences (HSS), particularly with a view to improving the guidance on Open Data for HSS researchers. As was the case with Open Access, the sciences have surged ahead in the adoption of Open Data, although there is no obvious reason why HSS communities should lag behind. In the humanities, for instance, digital humanists are creating a variety of datasets that would benefit from sharing and reuse. Likewise, historians digitise huge amounts of source material that often languishes on local hard drives or is simply forgotten about. A number of quotes in the recent report by Ithaka S + R on the changing digital practices of historians support the idea that Open Data would be welcomed in the humanities:

‘Some historians hope that their own digitization work can contribute to more content being made available for both the public and other scholars.’ [p. 12]

‘One [researcher] discussed how a mass digitization of government audio recordings and their availability in the public domain have shaped his career and his research.’ [p. 14]

There is a similar need for Open Data in the social sciences too. The recently launched Open Economics Principles call for Open Data as a default practice in economics, particularly so that results can be reproduced in the hope of avoiding another Reinhart-Rogoff scandal, for example. And in psychology there have been similar calls for the publication of data for verification purposes. There is, therefore, a clear need for Open Data in the humanities and social sciences and I would be especially interested to work with Open Data practitioners in HSS communities on how the guidance can best be improved here – please do get in touch!

Another key objective of my fellowship will be to establish a working group on Open Data publication ethics. This ties in more with my current part-time position as managing editor of the Ubiquity Press metajournals. These journals feature peer-reviewed papers describing openly available datasets in a range of subjects and act as an incentive for researchers to publish their data openly and according to disciplinary standards. As data publishing is such a new field, there is currently no group that journal editors can approach for case-by-case advice on data publishing ethics, similar to the Committee on Publication Ethics, although such a committee would be a useful addition to publishing community. Again, please do get in touch if you have any ideas for how this committee could operate.

I will be blogging regularly about my Panton activities and you can also connect with me on Twitter. I would be happy to receive suggestions on how best to accomplish my two objectives, or on any aspect of Open Data for that matter!