Public Health Data: as Open as it can be?

April 23, 2014 in Panton Fellowships

I recently sent out invitations for a forthcoming article collection entitled Exemplar Public Health Datasets, to be published in the recently launched journal Open Health Data. The collection will feature peer-reviewed articles describing public health datasets as part of the Enhancing discoverability of public health and epidemiology research data project. Funded by the Wellcome Trust, the project seeks to appraise the ways in which public health datasets could be made easier for potential users to discover, and this article collection is one way of exploring the issue.


The collection will be composed of Data Papers, which are publications designed to make other researchers aware of data that is of potential use to them. Importantly, a data paper does not replace a research article, but rather complements it. As such, the data paper describes the methods used to create the dataset, its structure, its reuse potential, and a link to its location in a repository.

However, one issue that immediately presented itself is that most public health research data is not collected in a way that allows open sharing. Public health research often takes the form of large-scale longitudinal studies involving numerous research groups, during which a great deal of patient data is collected. Whilst the data are anonymised, there are always concerns surrounding de-identification, especially given the sensitive nature of the material, and so data is shared only to those who meet the accessibility criteria. As Jones et al. write, regarding the Secure Anonymous Information Linkage (SAIL) Gateway:

‘Even though the data are anonymised, someone with legitimate access to the data, or a potential intruder, may attempt to re-identify individuals or clinicians. It is essential, therefore, that anonymisation is robust, that measures to further encrypt key variables are in place, and that data presented can be limited to the needs of a given project.’ [1]

Because of this, data sharing in public health is approached with extreme caution and there are many disincentives for doing so. The Exemplar Public Health Datasets collection aims to change this by formalising the process for data access. For example, if there are accessibility criteria associated with a particular dataset, a Data Paper would be a great place for outlining the criteria, the location of the dataset, and steps needed to access it. What’s more, whilst the data itself might not be shareable, there is still a great deal of value in openly sharing consent forms, metadata and related protocols. The Data Paper format encourages the sharing of all elements related to the research lifecycle, aiming to reach a position where ‘Open’ is the default for public health research whilst still negotiating the complex world of access to patient data.

Get in touch if you have any questions!

[1] Jones et al. ‘A case study of the Secure Anonymous Information Linkage (SAIL) Gateway: A privacy-protecting remote access system for health-related research and evaluation’ Journal of Biomedical Informatics (in press)



Leave a Reply

Your email address will not be published. Required fields are marked *