Building an archaeological project repository I: Open Science means Open Data
In 2010 we authored a series of blog posts for the Open Knowledge Foundation subtitled ‘How open approaches can empower archaeologists’. These discussed the DART project, which is on the cusp of concluding.
The DART project collected large amounts of data, and as part of the project, we created a purpose-built data repository to catalogue this and make it available, using CKAN, the Open Knowledge Foundation’s open-source data catalogue and repository. Here we revisit the need for Open Science in the light of the DART project. In a subsequent post we’ll look at why, with so many repositories of different kinds, we felt that to do Open Science successfully we needed to roll our own.
Open data can change science
Open inquiry is at the heart of the scientific enterprise. Publication of scientific theories – and of the experimental and observational data on which they are based – permits others to identify errors, to support, reject or refine theories and to reuse data for further understanding and knowledge. Science’s powerful capacity for self-correction comes from this openness to scrutiny and challenge. (The Royal Society, Science as an open enterprise, 2012)
The Royal Society’s report Science as an open enterprise identifies how 21st century communication technologies are changing the ways in which scientists conduct, and society engages with, science. The report recognises that ‘open’ enquiry is pivotal for the success of science, both in research and in society. This goes beyond open access to publications (Open Access), to include access to data and other research outputs (Open Data), and the process by which data is turned into knowledge (Open Science).
The underlying rationale of Open Data is this: unfettered access to large amounts of ‘raw’ data enables patterns of re-use and knowledge creation that were previously impossible. The creation of a rich, openly accessible corpus of data introduces a range of data-mining and visualisation challenges, which require multi-disciplinary collaboration across domains (within and outside academia) if their potential is to be realised. An important step towards this is creating frameworks which allow data to be effectively accessed and re-used. The prize for succeeding is improved knowledge-led policy and practice that transforms communities, practitioners, science and society.
The need for such frameworks will be most acute in disciplines with large amounts of data, a range of approaches to analysing the data, and broad cross-disciplinary links – so it was inevitable that they would prove important for our project, Detection of Archaeological residues using Remote sensing Techniques (DART).
DART: data-driven archaeology
DART aimed is to develop analytical methods to differentiate archaeological sediments from non-archaeological strata, on the basis of remotely detected phenomena (e.g. resistivity, apparent dielectric permittivity, crop growth, thermal properties etc). The data collected by DART is of relevance to a broad range of different communities. Open Science was adopted with two aims:
- to maximise the research impact by placing the project data and the processing algorithms into the public sphere;
- to build a community of researchers and other end-users around the data so that collaboration, and by extension research value, can be enhanced.
‘Contrast dynamics’, the type of data provided by DART, is critical for policy makers and curatorial managers to assess both the state and the rate of change in heritage landscapes, and helps to address European Landscape Convention (ELC) commitments. Making the best use of the data, however, depends on openly accessible dynamic monitoring, along the lines of that developed for the Global Monitoring for Environment and Security (GMES) satellite constellations under development by the European Space Agency. What is required is an accessible framework which allows all this data to be integrated, processed and modelled in a timely manner.
It is critical that policy makers and curatorial managers are able to assess both the state and the rate of change in heritage landscapes. This need is wrapped up in national commitments to the European Landscape Convention (ELC). Making the best use of the data, however, depends on openly accessible dynamic monitoring, along similar lines to that proposed by the European Space Agency for the Global Monitoring for Environment and Security (GMES) satellite constellations. What is required is an accessible framework which allows all this data to be integrated, processed and modelled in a timely manner. The approaches developed in DART to improve the understanding and enhance the modelling of heritage contrast detection dynamics feeds directly into this long-term agenda.
Cross-disciplinary research and Open Science
Such approaches cannot be undertaken within a single domain of expertise. This vision can only be built by openly collaborating with other scientists and building on shared data, tools and techniques. Important developments will come from the GMES community, particularly from precision agriculture, soil science, and well documented data processing frameworks and services. At the same time, the information collected by projects like DART can be re-used easily by others. For example, DART data has been exploited by the Royal Agricultural University (RAU) for use in such applications as carbon sequestration in hedges, soil management, soil compaction and community mapping. Such openness also promotes collaboration: DART partners have been involved in a number of international grant proposals and have developed a longer term partnership with the RAU.
Open Science advocates opening access to data, and other scientific objects, at a much earlier stage in the research life-cycle than traditional approaches. Open Scientists argue that research synergy and serendipity occur through openly collaborating with other researchers (more eyes/minds looking at the problem). Of great importance is the fact that the scientific process itself is transparent and can be peer reviewed: as a result of exposing data and the processes by which these data are transformed into information, other researchers can replicate and validate the techniques. As a consequence, we believe that collaboration is enhanced and the boundaries between public, professional and amateur are blurred.
Challenges ahead for Open Science
Whilst DART has not achieved all its aims, it has made significant progress and has identified some barriers in achieving such open approaches. Key to this is the articulation of issues surrounding data-access (accreditation), licensing and ethics. Who gets access to data, when, and under what conditions, is a serious ethical issue for the heritage sector. These are obviously issues that need co-ordination through organisations like Research Councils UK with cross-cutting input from domain groups. The Arts and Humanities community produce data and outputs with pervasive social and ethical impact, and it is clearly important that they have a voice in these debates.