What are the incentives for data sharing?

November 5, 2013 in Panton Fellowships, Panton Principles

I have argued elsewhere that researchers should embrace scholarly openness because of the disciplinary benefits it affords. Specifically, and as is widely argued, Open Data ensures that research can be verified through replication and reused to pose and help answer new questions. Furthermore, in the humanities, Open Data can also contribute to the cultural commons, especially through initiatives such as the DPLA and Europeana. Open Data thus helps research move to more of an economy of sharing, rather than one of mere competition.

But the truth is that academia can be a ruthless area to work in and holding onto data is one way that researchers in some disciplines try to maintain a competitive advantage over their peers. For example, I recently spoke with a public health researcher who told me that she wouldn’t share any of her data until she had completely exhausted its potential for publications, which could take years. After that, she admitted she would have probably moved on to other things and the data would be forgotten about. Whilst this anecdote reflects the practices of only one researcher, I suspect that it reflects common practice for many researchers.

Data sharing therefore needs incentives, tangible rewards for individuals that work within the current system to encourage researchers to open up their data for the wider community. Of course, mandates are important too, although they can be a blunt instrument without broad community support. What, therefore, is the best way to reward data deposition and build community momentum behind Open Data? Three ways spring to mind:

Data citation

The most obvious way to incentivise Open Data is to ensure that data creators are formally credited for their contribution through the use of citations. Adopting a standardised mechanism for citing data will recognise/reward data creators and help track the impact of individual datasets. DataCite suggests the following structure for citing a dataset:

Creator (PublicationYear): Title. Version. Publisher. ResourceType. Identifier

Source: http://www.datacite.org/whycitedata

Nevertheless, data citation is a new and undeveloped concept, and the practicalities are still to be fully worked out. The following report by the CODATA-ICSTI Task Group on Data Citation Standards and Practices goes into more detail on these issues: ‘Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data’.

New collaborations

Data sharing can of course lead to new collaborations with other researchers, either those looking to build upon pre-existing datasets or to group together to collect new data. In many ways, data sharing is an advertisement for the kind of work a researcher is doing – not just the subject expertise, but methodological expertise too – and is a statement that one is open to sharing/collaboration. This approach is particularly prevalent in the digital humanities, which is often seen to set itself apart for its collaborative approach to scholarship (see Digital Humanities Questions & Answers for an example of this collaborative approach). As the field is in its relative infancy, many digital humanists are self-taught according to their individual needs and so there isn’t a methodological canon that researchers are taught, which makes collaborating and sharing skillsets an attractive prospect.

Perception of rigour

As Wicherts et al. demonstrated, there is a correlation between a willingness to share data and the quality of statistical reporting in psychology. Although this is only a correlation, the argument here is that researchers may take more care over the quality and presentation of their data when they have committed to sharing it, and so researchers who routinely share data can build up a reputation for scholarly rigour. Obviously this incentive is less tangible than the previous two, but it is still worth mentioning that Open Data, and openness in general, can contribute to the overall positive reputation of a researcher.

These appear to me to be the immediately obvious incentives for the average researcher to share their data, and as a Panton Fellow I’m looking to explore these further this year. I would be interested to read any I’ve missed!

Tags: Open Data