You are browsing the archive for Open Data.

Das war 2014 – das kommt 2015!

- December 22, 2014 in Featured, Open Data News, openGLAM, Planet

Wenn die Zahl der Out-of-Office-Replies in der Inbox steigt wie die Anzahl der verspeisten Vanillekipferl im Magen, dann wird es schön langsam Zeit: Für besinnliche* Feiertage, in denen man – bei noch mehr Keksen und einem Heißgetränk der Wahl – das alte Jahr Revue passieren lässt und sich darauf freut, was das neue bringen wird. Wir haben das mit den Keksen schon erledigt und blicken nun zurück auf ein erfolgreiches Jahr 2014… 

Rückblick: Das haben wir 2014 erreicht

Mit dem Open Data Portal Österreich, einem gemeinsamen Projekt mit Wikimedia, gelang ein großer Fortschritt der Open Data Community, der sich schnell als Fixpunkt etablierte. Österreich hat damit als einziges Land eine gemeinsame Open Data Infrastruktur für Verwaltung, Wirtschaft, Kultur, NGOs und viele mehr. Das Gesamtkonzept Open Data in Österreich mit dem Verwaltungsportal data.gv.at und Schwesternplattform opendataportal.at ist und bleibt ein Vorzeigebeispiel in Europa, ausgezeichnet mit dem UN Public Service Award.

Neben aktiver Community- und Überzeugungsarbeit in der Verwaltung sorgte die OKFN auch für rege Teilnahme in den Arbeitsgruppen der Digitalen Agenda Wien und vernetzte bei diversen MeetUps, wie zuletzt dem X-mas Gathering, die etablierte Community mit neuen TeilnehmerInnen. 

Und stärkten dabei auch den Nachwuchs der österreichischen IT-Szene: Mit dem Young Coders Festival im Oktober fand der erste Open Data Hackathon für Jugendliche aus ganz Österreich statt. Hier geht’s zur Videozusammenfassung (4 min) des aufregenden Wochenendes.

Young Coders Festival 2014 - Gruppenfoto

Große Freude beim Young Coders Festival 2014 – im kommenden Jahr wird es wiederholt!

Auch die Arbeitsgruppe Open Science rund um Stefan Kasberger blickt auf ein ereignisreiches Jahr zurück, in dem das Thema offener Wissenschaft nicht nur in der Community rege diskutiert wurde, sondern auch Beachtung in Presse und Politik fand: Mit Jänner 2014 trat die OKFN dem Open Access Network Austria (OANA) bei und nahm an drei Arbeitsgruppen teil. Im Juni beehrten Peter Murray Rust und Michelle Brook, die anlässlich eines Vortrages beim Wissenschaftsfond FWF nach Österreich reisten, auf Einladung der Open Science Gruppe auch das Metalab Vienna für einen Hackathon mit anschließendem MeetUp zum Thema Content Mining. Auf der MS Wissenschaft im September, welche die Open Knowledge Foundation in Kooperation mit Wikimedia und dem Wissenschaftsfond FWF organisierte, diskutieren ExpertIinnen aus Deutschland, Österreich und der Schweiz unter anderem über die Chancen von Open Science

Aus der Arbeitsgruppe OpenGLAM lässt sich ebenfalls so einiges berichten: “Unser Ziel im Jahr 2014 war es, Awareness zum Thema Open Data in Kulturinstitutionen zu schaffen, speziell in Hinblick auf die kommende Novelle der PSI Richtline, welche die Weiterverwendung von Dokumenten öffentlicher Stellen auch auf den GLAM (Galerien, Bibliotheken, Archive, Museen) Bereich ausdehnt,” rekapituliert Bernhard Haslhofer, OKFN Vorstandsmitglied und Mit-Koordinator von OpenGLAM in Österreich. Vorträge zum Thema Offene Daten im Kulturbereich (Wien Museum) und zur Anwendung von Linked Data in den Digitalen Geisteswissenschaften (Österreichische Akademie der Wissenschaften) unterstützen dieses Vorhaben.

 

Ausblick: Das schaffen wir 2015 …
Nach einem ereignisreichen Jahr 2014 haben wir auch für das neue Jahr so einiges geplant. Im Jänner startet unter dem Motto “Wo Open Data drin ist, soll auch Open Data draufstehen” das Projekt Open Data Inside, das mit einem digitalen Abzeichen die Qualität und Relevanz offener Daten auch im Business-Bereich hervorhebt. Das netidee-geförderte Projekt Gute Taten für gute Daten befreit Datensätze aus ihren verschlossenen Formaten und auch bei einem weiteren Projekt, das wir bald bekannt geben werden, steuert die OKFN ihre Expertise bei. Im Herbst laden wir abermals jugendliche Programmiertalente zum Young Coders Festival ein, und das Open Data Portal baut seine Rolle als Fixstern am Open Data Himmel weiter aus. Zusätzlich bleiben wir in der Community aktiv und organisieren weiterhin MeetUps und andere Treffen.

 

… mit eurer Unterstützung! 
Die Open Knowledge Foundation Österreich ist als gemeinnütziger, nicht-gewinnorientierter Verein auf die Unterstützung ihrer Mitglieder angewiesen. Nur durch euren Support können wir unsere Kapazitäten optimal einsetzen – und im kommenden Jahr noch mehr Bewusstsein und Fortschritt schaffen für Open Data, Open Science, Transparenz und Offenheit. In diesem Sinne haben wir für dieses Weihnachtsfest den frommen Wunsch: Zeigt uns eure Unterstützung! Macht mit und werdet OKFN-Mitglied – denn gemeinsam netzwerkt es sich besser, als alleine.

 

In diesem Sinn wünschen wir euch frohe Feiertage & einen guten Rutsch ins neue Jahr! 

 

*Tipp für alle gestressten Last-Minute-Shopper: Eine OKFN Mitgliedschaft ist ein wunderschönes und nachhaltiges Weihnachtsgeschenk, das sich ganz ohne Einkaufsstraßenwahnsinn und Shopping-Mall-Madness hier erwerben lässt. 

Love your data – and let others love it, too

- January 16, 2014 in Reproducibility

[This post is also available in French. Ce billet est également disponible en français.]

The Projects initiative, a Digital Science endeavour, provides a desktop app that allows you to comprehensively organize and manage data you produce as research projects progress. The rationale behind Projects is that scientific data needs to be properly managed and preserved if we want it to be perennial: there’s indeed a worrisome trend showcasing that every year, the amount of research data being generated increases by 30%, and yet a massive 80% of scientific data is lost within two decades.

Projects and open science data sharing platform figshare published an impressive and pretty telling infographic on science data preservation and chronic mismanagement [scroll down to see it]. What struck me looking at these numbers is neither the high throughput data production nor the overall funds it requires – 1,5 trillion USD spent on R&D! – but the little to no information on public policies aimed at solving the problem.

Read the rest of this entry →

What are the incentives for data sharing?

- November 5, 2013 in Panton Fellowships, Panton Principles

I have argued elsewhere that researchers should embrace scholarly openness because of the disciplinary benefits it affords. Specifically, and as is widely argued, Open Data ensures that research can be verified through replication and reused to pose and help answer new questions. Furthermore, in the humanities, Open Data can also contribute to the cultural commons, especially through initiatives such as the DPLA and Europeana. Open Data thus helps research move to more of an economy of sharing, rather than one of mere competition.

But the truth is that academia can be a ruthless area to work in and holding onto data is one way that researchers in some disciplines try to maintain a competitive advantage over their peers. For example, I recently spoke with a public health researcher who told me that she wouldn’t share any of her data until she had completely exhausted its potential for publications, which could take years. After that, she admitted she would have probably moved on to other things and the data would be forgotten about. Whilst this anecdote reflects the practices of only one researcher, I suspect that it reflects common practice for many researchers.

Data sharing therefore needs incentives, tangible rewards for individuals that work within the current system to encourage researchers to open up their data for the wider community. Of course, mandates are important too, although they can be a blunt instrument without broad community support. What, therefore, is the best way to reward data deposition and build community momentum behind Open Data? Three ways spring to mind:

Data citation

The most obvious way to incentivise Open Data is to ensure that data creators are formally credited for their contribution through the use of citations. Adopting a standardised mechanism for citing data will recognise/reward data creators and help track the impact of individual datasets. DataCite suggests the following structure for citing a dataset:

Creator (PublicationYear): Title. Version. Publisher. ResourceType. Identifier

Source: http://www.datacite.org/whycitedata

Nevertheless, data citation is a new and undeveloped concept, and the practicalities are still to be fully worked out. The following report by the CODATA-ICSTI Task Group on Data Citation Standards and Practices goes into more detail on these issues: ‘Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data’.

New collaborations

Data sharing can of course lead to new collaborations with other researchers, either those looking to build upon pre-existing datasets or to group together to collect new data. In many ways, data sharing is an advertisement for the kind of work a researcher is doing – not just the subject expertise, but methodological expertise too – and is a statement that one is open to sharing/collaboration. This approach is particularly prevalent in the digital humanities, which is often seen to set itself apart for its collaborative approach to scholarship (see Digital Humanities Questions & Answers for an example of this collaborative approach). As the field is in its relative infancy, many digital humanists are self-taught according to their individual needs and so there isn’t a methodological canon that researchers are taught, which makes collaborating and sharing skillsets an attractive prospect.

Perception of rigour    

As Wicherts et al. demonstrated, there is a correlation between a willingness to share data and the quality of statistical reporting in psychology. Although this is only a correlation, the argument here is that researchers may take more care over the quality and presentation of their data when they have committed to sharing it, and so researchers who routinely share data can build up a reputation for scholarly rigour. Obviously this incentive is less tangible than the previous two, but it is still worth mentioning that Open Data, and openness in general, can contribute to the overall positive reputation of a researcher.

These appear to me to be the immediately obvious incentives for the average researcher to share their data, and as a Panton Fellow I’m looking to explore these further this year. I would be interested to read any I’ve missed!

 

“It’s not only peer-reviewed, it’s reproducible!”

- October 18, 2013 in Panton Fellowships, Panton Principles, Reproducibility

Peer review is one of the oldest and most respected instruments of quality control in science and research. Peer review means that a paper is evaluated by a number of experts on the topic of the article (the peers). The criteria may vary, but most of the time they include methodological and technical soundness, scientific relevance, and presentation.

“Peer-reviewed” is a widely accepted sign of quality of a scientific paper. Peer review has its problems, but you won’t find many researchers that favour a non peer-reviewed paper over a peer-reviewed one. As a result, if you want your paper to be scientifically acknowledged, you most likely have to submit it to a peer-reviewed journal.

Even though it will take more time and effort to get it published than in a non peer-reviewed publication outlet.

Peer review helps to weed out bad science and pseudo-science, but it also has serious limitations. One of these limitations is that the primary data and other supplementary material such as documentation source code are usually not available. The results of the paper are thus not reproducible. When I review such a paper, I usually have to trust the authors on a number of issues: that they have described the process of achieving the results as accurate as possible, that they have not left out any crucial pre-processing steps and so on. When I suspect a certain bias in a survey for example, I can only note that in the review, but I cannot test for that bias in the data myself. When the results of an experiment seem to be too good to be true, I cannot inspect the data pre-processing to see if the authors left out any important steps.

As a result, later efforts in reproducing research results can lead to devastating outcomes. Wang et al. (2010) for example found that they could not reproduce almost all of the literature on a certain topic in computer science.

“Reproducible”: a new quality criterion

Needless to say this is not a very desirable state. Therefore, I argue that we should start promoting a new quality criterion: “reproducible”. Reproducible means that the results achieved in the paper can be reproduced by anyone because all of the necessary supplementary resources have been openly provided along with the paper.

It is easy to see why a peer-reviewed and reproducible paper is of higher quality than just a peer-reviewed one. You do not have to take the researchers’ word of how they calculated their results – you can reconstruct them yourself. As a welcome side-effect, this would make more datasets and source code openly available. Thus, we could start building on each others’ work and aggregate data from different sources to gain new insights.

In my opinion, reproducible papers could be published alongside non-reproducible papers, just like peer-reviewed articles are usually published alongside editorials, letters, and other non peer-reviewed content. I would think, however, that over time, reproducible would become the overall quality standard of choice – just like peer-reviewed is the preferred standard right now. To help this process, journals and conferences could designate a certain share of their space to reproducible papers. I would imagine that they would not have to do that for too long though. Researchers will aim for a higher quality standard, even if it takes more time and effort.

I do not claim that reproducibility solves all of the problems that we see in science and research right now. For example, it will still be possible to manipulate the data to a certain degree. I do, however, believe that reproducibility as an additional quality criterion would be an important step for open and reproducible science and research.

So that you can say to your colleague one day: “Let’s go with the method described in this paper. It’s not only peer-reviewed, it’s reproducible!”

Expanded Access to the Results of Federally Funded Research

- February 25, 2013 in Announcements

On Friday 22nd February, 2013 the U.S. Office of Science and Technology Policy (OSTP) released a statement to say that the “Obama Administration is committed to the proposition that citizens deserve easy access to the results of scientific research their tax dollars have paid for”.  This was accompanied by a new policy memorandum and a long-awaited response by OSTP Director John Holdren to the ‘We The People’ petition that was signed by over 65,000 people calling for expanded public access to research.

OSTP_logo

 

Advocates of green open access were pleased to see this new directive and Peter Suber in particular gives a nice clear summary of it in a Google+ post. With up to 12-month embargoes allowed before research can be self-archived even the Association of American Publishers wrote a statement of support for this new policy.

This policy certainly represents a step in the right direction, but it’s not as strong as some would have liked — prominent OA advocate & scientist Michael Eisen writes on his blog:
No celebrations here: why the White House public access policy sucks.

A comparison with the United Kingdom’s RCUK policy, clearly shows the OSTP to be the weaker of the two:

Breadth: OSTP applies only to scientific research, whereas RCUK’s applies to Arts, Humanities, and Social Sciences research too.

Immediacy: OSTP allows 12 month embargoes, whilst RCUK accepts a maximum embargo of only 6 months for STM research

Coverage: OSTP policy applies only to Federal agencies with more than $100M in R&D expenditures, whilst RCUK’s applies to all RCUK funded research – no exceptions.

Some would say this is no bad thing. The OSTP policy is certainly more lenient on publishers and thus is likely to be uncontroversially implemented. Hopes for stronger OA policy in the USA are emboldened by the recent Fair Access to Science and Technology Research (FASTR) Act which proposes to shorten the maximum embargo time allowed to just 6-months, in-line with RCUK policy.

Finally, the pleasant surprise for everyone with this new OSTP policy is the specific and explicit inclusion of access to data not just publications, in section 4 titled Objectives for Public Access to Scientific Data in Digital Formats aiming to:

“Maximize access, by the general public and without charge, to digitally formatted scientific data created with Federal funds”

The United States of America has now clearly joined the global movement towards open access to taxpayer-funded research. We think the world will certainly benefit from this new policy.