You are browsing the archive for Samuel Moore.

Panton Fellowship: End-of-Year Round-up

- November 10, 2014 in Featured, Panton Fellowships

This is cross-posted from Samuel Moore’s blog, Scholarly Skywritings.

My time as a Panton Fellow has been both busy and extremely rewarding. In the last year I’ve been involved in a number of projects, met some fantastic people and attended a number of events centered on data sharing within academia. Whilst data sharing in the humanities and social sciences is still in a very nascent stage, especially the average researcher’s awareness of open data, there is definitely a sense that it is firmly on the agenda as part of the broader move towards openness in scholarly research. The crucial thing now is to continue to reach out to the average researcher, highlighting the benefits that open data offers and ensuring that there is a stock of accessible resources offering practical advice to researchers on how to share their data.

Issues in Open Research Data

With this in mind, in tcover_3-page-001his final post I had originally wanted to be able to share the open-access book I’ve commissioned entitled Issues in Open Research Data, but alas it is still in production and will be published in November. Nevertheless, I am delighted to say that the book was successfully funded via the crowd-funding website Unglue.It and will be available in PDF, EPUB and low-cost print editions when it is published. The book features chapters by open data experts in a range of academic disciplines, covering practical information on licensing, ethics, and advice for data curators, alongside more theoretical issues surrounding the adoption of open data.

As the book will be open access, each chapter will be able to standalone from the main volume so that communities can host, distribute, build upon and remix the content. The book is primarily a work of advocacy and aims to start a conversation with the academic community at large – I’ll be sending out copies to research libraries, repositories and others that might be interested. Do get in touch if you think your institution would like a printed copy and I’ll see what I can do.

Journal of Open Humanities Data

Another initiative I wanted to mention is the forthcoming Journal of Open Humanities Data, which will be launching very soon through Ubiquity Press. The journal will feature peer-reviewed publications describing humanities data or techniques with high potential for reuse, everything from cultural items to large text corpora. In doing this, the journal aims to incentivise data sharing through publication credit, which in turn makes data citable through usual academic paper citation practices. Ultimately the journal will help researchers share their data, recommending repositories and best practices in the field, and will also help them track the impact of their data through citations and altmetrics. The call for papers will be posted in the next few weeks but, again, please do get in touch if you’d like to hear more.

Thanks!

Last of all, many thanks to the Open Knowledge Foundation for all their advice and support: specifically, Peter Murray-Rust, Michelle Brook, Jenny Molloy and Jonathan Grey, and many others too. I have already signed up to be involved in a few Open Knowledge projects in the coming year and I look forward to helping further the cause of openness across academia (and maybe working on my PhD..!)

Here is a roundup of some of the activities I’ve been involved in over the past year:

Blog posts

Books

Issues in Open Research Data

Project involvement

Panton Fellow Update: Introduction to Open Research Data

- May 5, 2014 in Panton Fellowships, Panton Principles

In my first three-month update report report I discussed the book I’m working on as the major output of my Panton Fellowship. Entitled Introduction to Open Research Data, the book explores both the practical and theoretical issues associated with Open Data from a range of general and disciplinary viewpoints. The book will be Open Access, available in various ebook formats and low-cost print editions, and remixing will be encouraged – particularly the subject-specific guidance, which disciplinary communities can build upon as a foundation for a collection of resources on Open Data.

Whilst I am still awaiting a couple of contributions, I am happy to be able to share a provisional table of contents for the book. (Chapter topics on the left and authors on the right . Chapter titles still TBD):

1. Foreword: Introduction to the Panton Fellowships
2. Introduction to the book and the Panton Principles – Sam Moore (with input from the original Panton group)
3. Open Content Mining – Peter Murray-Rust and Jenny Molloy
4. Open Data and Neoliberalism – Eric Kansa
5. Data Sharing in a Humanitarian Organization: The Experience of Médecins Sans Frontières – Unni Karunakara (previous published in PLOS Medicine)
6. Open Data in Earth/Climate Sciences – Sarah Callaghan
7. Open Data in Psychology – Wouter van den Bos, Mirjam Jenny and Dirk Wulff
8. Digital Humanities and Linked Open Data  – Jodi Schneider
9. Open Data in Palaeontology – Ross Mounce
10. Open Data in the Health Sciences  – Tom Pollard
11. Open Data in Economics – Velichka Dimitrova
12. Why Open Drug Discovery Needs Four Simple Rules for Licensing Data and Models – Antony J. Williams, John Wilbanks and Sean Ekins (previously published in PLOS Computational Biology)

I won’t go into more detail about the content of each chapter, though authors were given free rein to approach the subject however they saw fit. Furthermore, I sought permission from the authors of the previously published pieces, though they were originally published under CC BY, and all were happy for their contributions to appear in the book.

I’m super excited for how this is coming together and I hope to have the book published by August. I will of course be posting updates along the way. Get in touch if you have any questions!

Public Health Data: as Open as it can be?

- April 23, 2014 in Panton Fellowships

I recently sent out invitations for a forthcoming article collection entitled Exemplar Public Health Datasets, to be published in the recently launched journal Open Health Data. The collection will feature peer-reviewed articles describing public health datasets as part of the Enhancing discoverability of public health and epidemiology research data project. Funded by the Wellcome Trust, the project seeks to appraise the ways in which public health datasets could be made easier for potential users to discover, and this article collection is one way of exploring the issue.

spcol_exemplar-public-health-datasets.jpg

The collection will be composed of Data Papers, which are publications designed to make other researchers aware of data that is of potential use to them. Importantly, a data paper does not replace a research article, but rather complements it. As such, the data paper describes the methods used to create the dataset, its structure, its reuse potential, and a link to its location in a repository.

However, one issue that immediately presented itself is that most public health research data is not collected in a way that allows open sharing. Public health research often takes the form of large-scale longitudinal studies involving numerous research groups, during which a great deal of patient data is collected. Whilst the data are anonymised, there are always concerns surrounding de-identification, especially given the sensitive nature of the material, and so data is shared only to those who meet the accessibility criteria. As Jones et al. write, regarding the Secure Anonymous Information Linkage (SAIL) Gateway:

‘Even though the data are anonymised, someone with legitimate access to the data, or a potential intruder, may attempt to re-identify individuals or clinicians. It is essential, therefore, that anonymisation is robust, that measures to further encrypt key variables are in place, and that data presented can be limited to the needs of a given project.’ [1]

Because of this, data sharing in public health is approached with extreme caution and there are many disincentives for doing so. The Exemplar Public Health Datasets collection aims to change this by formalising the process for data access. For example, if there are accessibility criteria associated with a particular dataset, a Data Paper would be a great place for outlining the criteria, the location of the dataset, and steps needed to access it. What’s more, whilst the data itself might not be shareable, there is still a great deal of value in openly sharing consent forms, metadata and related protocols. The Data Paper format encourages the sharing of all elements related to the research lifecycle, aiming to reach a position where ‘Open’ is the default for public health research whilst still negotiating the complex world of access to patient data.

Get in touch if you have any questions!

[1] Jones et al. ‘A case study of the Secure Anonymous Information Linkage (SAIL) Gateway: A privacy-protecting remote access system for health-related research and evaluation’ Journal of Biomedical Informatics (in press) http://doi.org/10.1016/j.jbi.2014.01.003

 

 

Panton Fellow Update: Samuel Moore

- January 8, 2014 in Panton Fellowships, Panton Principles, Publications

My first few months as a Panton Fellow have flown by and so I wanted to provide a quick update on the work I’ve been doing. Whilst it’s not possible to discuss everything, I thought it would be good to list some of the larger projects I’ve been working on.

Early into the fellowship I made contact with two of the Open Economics Working Group coordinators, Velichka Dimitrova and Sander Van Der Waal, to discuss how best to encourage Open Data in economics. Whilst we thought that a data journal could be a good way of incentivising data sharing, we also thought it would be sensible to conduct a survey of economists and their data sharing habits to see if our assumptions were correct. This will give us some firm evidence of the best way to advocate for Open Data in economics. The results will be released when they are available.

Staying within the OKFN framework, I also helped kick-start the Open Humanities Group back into action in a meeting with the organisers and a post to the discussion list (posing the question: What does Open Humanities research data mean to you?). As a humanities researcher myself I am very keen to see the humanities embrace a more open approach to scholarship and it’s great to see a resurgence of activity here. So far this has resulted in a forthcoming Open Literature Sprint on January 25th in London. This sprint will build upon some of the work already completed on the Open Literature and Textus projects for collaborating, analysing and sharing open access and public domain works of literature and philosophy. Whilst I cannot take any credit for organising the event, I will certainly be in attendance and I encourage all those interested in Open Humanities research/data to attend too. We are looking for coders, editors and textfinders for the event – absolutely no technical skills required! You can sign up to attend here.

However, the majority of my time has been spent working on a book: An Introduction to Open Research Data. This edited volume will feature chapters by Open Data experts in a range of academic disciplines, covering practical information on licensing, ethics, and information for data curators, alongside more theoretical issues surrounding the adoption of Open Data. As the book will be Open Access, each chapter will be able to standalone from the main volume so communities can host, distribute and remix the content that is relevant to them (the book will also be available in print). The table of contents is near enough finalised and the contributions are currently being written. I’m hoping the volume will be ready by August but watch this space! Do get in touch if you’ve any questions at all.

In addition, here is a round-up of the blogposts I’ve written so far:

On the Harvard Dataverse Network Project – an open-source tool for data sharing

What are the incentives for data sharing?

Panton Fellow Introduction: Samuel Moore

 

On the Harvard Dataverse Network Project (and why it’s awesome)

- December 10, 2013 in Panton Fellowships, Panton Principles, Tools

I am a huge fan of grass-roots approaches to scholarly openness. Successful community-led initiatives tend to speak directly to that community’s need and can grow by attracting interest from members on the fringes (just look at the success of the arXiv, for example). But these kinds of projects tend to be smaller scale and can be difficult to sustain, especially without any institutional backing or technical support.

This is why the Harvard Dataverse Network is so great: it facilitates research data sharing through a sustainable, scalable, open-source platform maintained by the Institute for Quantitative Social Sciences at Harvard. This means it is sustainable through institutional backing, but also empowers individual communities to manage their own research data.

In essence, a Dataverse is simply a data repository, but one that is both free to use and fully customisable according to a community’s need. In the project’s own words:

 

‘A Dataverse is a container for research data studies, customized and managed by its owner. A study is a container for a research data set. It includes cataloging information, data files and complementary files.’

(http://thedata.harvard.edu/dvn/)

 

There are a number of ways in which the Dataverse Network can be used to enable Open Data.

Journals

A Dataverse can be a great way of incentivising data deposition among journal authors, especially when coupled with journal policies of mandating Open Data for all published articles. Here, a journal’s editor or editorial team would maintain the Dataverse itself, including its look and feel, which would instil confidence in authors that the data is in trusted hands. In fact, for journals housed on Open Journal Systems, there will soon be a plugin launched that directly links the article submission form with the journal’s Dataverse. And so, from an author’s perspective, the deposition of data will be as seamless as submitting a supporting information file. This presentation [pdf] goes into the plugin in more detail (and provides more info on the Dataverse project itself).

(Sub-)Disciplines

There are some disciplines that simply do not have their own subject-specific repository and so a Dataverse would be great for formalising and incentivising Open Data here. In many communities, datasets are uploaded to general repositories (Figshare, for example) that may not be tailored to their needs. Although this isn’t a problem – it’s great that general repositories exist – a discipline-maintained repository would automatically confer a level of reputation sufficient to encourage others to use it. What’s more, different communities have different preservation/metadata needs that general repositories might not be able to offer, and so the Dataverse could be tailored exactly to that community’s need.

Individuals

Interestingly, individuals can have their own Dataverses for housing all their shared research data. This could be a great way of allowing researchers to showcase their openly available datasets (and perhaps research articles too) in self-contained collections. The Dataverse could be linked to directly from a CV or institutional homepage, offering a kind of advertisment for how open a scholar one is. Furthermore, users can search across all Dataverses for specific keywords, subject areas, and so on, so there is no danger of being siloed off from the broader community.

So the Dataverse Network is a fantastic project for placing the future of Open Data in the hands of researchers and it would be great to see it adopted by scholarly communities throughout the world.

 

What are the incentives for data sharing?

- November 5, 2013 in Panton Fellowships, Panton Principles

I have argued elsewhere that researchers should embrace scholarly openness because of the disciplinary benefits it affords. Specifically, and as is widely argued, Open Data ensures that research can be verified through replication and reused to pose and help answer new questions. Furthermore, in the humanities, Open Data can also contribute to the cultural commons, especially through initiatives such as the DPLA and Europeana. Open Data thus helps research move to more of an economy of sharing, rather than one of mere competition.

But the truth is that academia can be a ruthless area to work in and holding onto data is one way that researchers in some disciplines try to maintain a competitive advantage over their peers. For example, I recently spoke with a public health researcher who told me that she wouldn’t share any of her data until she had completely exhausted its potential for publications, which could take years. After that, she admitted she would have probably moved on to other things and the data would be forgotten about. Whilst this anecdote reflects the practices of only one researcher, I suspect that it reflects common practice for many researchers.

Data sharing therefore needs incentives, tangible rewards for individuals that work within the current system to encourage researchers to open up their data for the wider community. Of course, mandates are important too, although they can be a blunt instrument without broad community support. What, therefore, is the best way to reward data deposition and build community momentum behind Open Data? Three ways spring to mind:

Data citation

The most obvious way to incentivise Open Data is to ensure that data creators are formally credited for their contribution through the use of citations. Adopting a standardised mechanism for citing data will recognise/reward data creators and help track the impact of individual datasets. DataCite suggests the following structure for citing a dataset:

Creator (PublicationYear): Title. Version. Publisher. ResourceType. Identifier

Source: http://www.datacite.org/whycitedata

Nevertheless, data citation is a new and undeveloped concept, and the practicalities are still to be fully worked out. The following report by the CODATA-ICSTI Task Group on Data Citation Standards and Practices goes into more detail on these issues: ‘Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data’.

New collaborations

Data sharing can of course lead to new collaborations with other researchers, either those looking to build upon pre-existing datasets or to group together to collect new data. In many ways, data sharing is an advertisement for the kind of work a researcher is doing – not just the subject expertise, but methodological expertise too – and is a statement that one is open to sharing/collaboration. This approach is particularly prevalent in the digital humanities, which is often seen to set itself apart for its collaborative approach to scholarship (see Digital Humanities Questions & Answers for an example of this collaborative approach). As the field is in its relative infancy, many digital humanists are self-taught according to their individual needs and so there isn’t a methodological canon that researchers are taught, which makes collaborating and sharing skillsets an attractive prospect.

Perception of rigour    

As Wicherts et al. demonstrated, there is a correlation between a willingness to share data and the quality of statistical reporting in psychology. Although this is only a correlation, the argument here is that researchers may take more care over the quality and presentation of their data when they have committed to sharing it, and so researchers who routinely share data can build up a reputation for scholarly rigour. Obviously this incentive is less tangible than the previous two, but it is still worth mentioning that Open Data, and openness in general, can contribute to the overall positive reputation of a researcher.

These appear to me to be the immediately obvious incentives for the average researcher to share their data, and as a Panton Fellow I’m looking to explore these further this year. I would be interested to read any I’ve missed!

 

Panton Fellow Introduction: Samuel Moore

- October 3, 2013 in Panton Fellowships, Panton Principles

Hellphotoo! My name is Samuel Moore and I am delighted to have been selected for a Panton Fellowship this year. I wanted to write a quick post to introduce myself, my background and my plans for the fellowship.

I am a second-year, part-time PhD student at King’s College London working in the Centre for e-Research, which is part of the King’s Digital Humanities department. Even though the Panton Fellowships have generally been aimed at scientists, I am actually a humanities researcher by training, having studied philosophy as an undergraduate and literature as a master’s student. My PhD research straddles the border between the humanities and social sciences – I am conducting a series of multi-year case studies to assess the extent to which Open Access publishing is changing research practices in a number of humanities subjects. Overall, my academic interests centre on the ways in which ‘openness’ can unlock new methods of scholarly research and communication, and I feel that open data is one key piece in this puzzle.

With this in mind, one of my aims for the year is to advocate for, and build momentum behind, Open Data in the humanities and social sciences (HSS), particularly with a view to improving the guidance on Open Data for HSS researchers. As was the case with Open Access, the sciences have surged ahead in the adoption of Open Data, although there is no obvious reason why HSS communities should lag behind. In the humanities, for instance, digital humanists are creating a variety of datasets that would benefit from sharing and reuse. Likewise, historians digitise huge amounts of source material that often languishes on local hard drives or is simply forgotten about. A number of quotes in the recent report by Ithaka S + R on the changing digital practices of historians support the idea that Open Data would be welcomed in the humanities:

‘Some historians hope that their own digitization work can contribute to more content being made available for both the public and other scholars.’ [p. 12]

‘One [researcher] discussed how a mass digitization of government audio recordings and their availability in the public domain have shaped his career and his research.’ [p. 14]

There is a similar need for Open Data in the social sciences too. The recently launched Open Economics Principles call for Open Data as a default practice in economics, particularly so that results can be reproduced in the hope of avoiding another Reinhart-Rogoff scandal, for example. And in psychology there have been similar calls for the publication of data for verification purposes. There is, therefore, a clear need for Open Data in the humanities and social sciences and I would be especially interested to work with Open Data practitioners in HSS communities on how the guidance can best be improved here – please do get in touch!

Another key objective of my fellowship will be to establish a working group on Open Data publication ethics. This ties in more with my current part-time position as managing editor of the Ubiquity Press metajournals. These journals feature peer-reviewed papers describing openly available datasets in a range of subjects and act as an incentive for researchers to publish their data openly and according to disciplinary standards. As data publishing is such a new field, there is currently no group that journal editors can approach for case-by-case advice on data publishing ethics, similar to the Committee on Publication Ethics, although such a committee would be a useful addition to publishing community. Again, please do get in touch if you have any ideas for how this committee could operate.

I will be blogging regularly about my Panton activities and you can also connect with me on Twitter. I would be happy to receive suggestions on how best to accomplish my two objectives, or on any aspect of Open Data for that matter!