You are browsing the archive for open source.

Second Quarterly Report on my Panton Fellowship

- March 26, 2014 in Panton Fellowships

by Timothy Appnel

by Timothy Appnel

I am now almost halfway through my Panton Fellowship, so it is time to sum up my activities once again.

The most important activity in the last quarter was surely the work on the open source visualization Head Start. Head Start is intended to give scholars an overview of a research field. You can find out all about the initial release in this blog post. I was busy in the last few weeks with bugfixing and stability improvements. I also refactored the whole pre-processing system and further integrated the work of Philipp Weißensteiner with regards to time-series visualization. If you are interested in trying out Head Start, or – even better – would like to contribute to its development, check out the Github repository.

Furthermore, I attended the Science Online un-conference in Raleigh (February 27 to March 1). Scio14 was very inspiring and engaging. Cameron Neylon hosted a great session on imagining the far future of academic publishing. In Rachel Levy‘s workshop on visualizations, we reflected on our own visualizations and there were tons of tips for improving one’s work. Other great sessions included post-publication peer review (with Ivan Oransky), altmetrics (facilitated by Cesar Berrios-Otero), and alternate careers in science (led by Eva Amsen). I also encourage you to check out the videos of the keynotes which include a very inspiring talk by Rebecca Tripp and Meg Lowman on neglected audiences in science, and the awesone crowd-sourced 3D printing project for creating prosthetic hands by Nick Parker and Jon Schull.

Let’s move on to my work for the local Austrian community. Together with my fellow OKFN members Sylvia Petrovic-Majer, Stefan Kasberger, and Christopher Kittel, I became active (remotely for now) in the Open Access Network Austria (OANA). Specifically, I am contributing to the working group “Involvment of researchers in open access”. I am very excited about this opportunity as it is one of the objectives of my Panton Fellowship to draw more researchers in open science.

What else? Earlier this year, I was interviewed for the openscienceASAP podcast. In the interview, I talked about altmetrics, the need for an inclusive approach to open science, and the Panton Fellowships. You can find the podcast here (in German). If you have read my last report, you may remember that I spoke on a panel about open science at University of Graz. The video of the panel (in German) is now online and can be found here. Furthermore, I’d like to draw your attention to the monthly sum-ups of open science activities in the German speaking world and beyond: January, February.

So what will my next quarter look like? As you may remember from my last report, I am currently a visiting scholar at University of Pittsburgh. In the weeks to come, I will integrate Head Start with Conference Navigator 3, developed  by the great folks of the PAWS Lab here in Pittsburgh. Conference Navigator is a nifty scheduling system that allows you to create a personal conference schedule by bookmarking talks from the program. The system then gives you recommendations for further talks based on your choices. Head Start will be used as an alternate way of looking at the topics of the conference, and to give better context to the talks that you already selected. I will return to Austria in June, just in time for Peter Murray-Rust‘s visit to Vienna. There are already a lot of activities planned around his stay, and I am very much looking forward to that. As always, please get in touch if you have any questions or comments, or in case you want to collaborate on one or the other project.

New version of open source visualization Head Start released

- February 24, 2014 in Panton Fellowships

In July last year, I released the first version of a knowledge domain visualization called Head Start. Head Start is intended for scholars who want to get an overview of a research field. They could be young PhDs getting into a new field, or established scholars who venture into a neighboring field. The idea is that you can see the main areas and papers in a field at a glance without having to do weeks of searching and reading.

Interface of Head Start

Interface of Head Start

You can find an application for the field of educational technology on Mendeley Labs. Papers are grouped by research area, and you can zoom into each area to see the individual papers’ metadata and a preview (or the full text in case of open access publications). The closer two areas are, the more related they are subject-wise. The prototye is based on readership data from the online reference management system Mendeley. The idea is that the more often two papers are read together, the closer they are subject-wise. More information on this approach can be found in my dissertation (see chapter 5), or if you like it a bit shorter, in this paper and in this paper.

Head Start is a web application built with D3.js. The first version worked very well in terms of user interaction, but it was a nightmare to extend and maintain. Luckily, Philipp Weißensteiner, a student at Graz University of Technology became interested in the project. Philipp worked on the visualization as part of his bachelor’s thesis at the Know-Center. Not only did he modularize the source code, he also introduced Javascript Finite State Machine that lets you easily describe different states of the visualization. To setup a new instance of Head Start is now only a matter of a couple of lines. Philipp developed a cool proof of concept for his approach: a visualization that shows the evolution of a research field over time using small multiples. You can find his excellent bachelor’s thesis in the repository (German).

Head Start Timeline View

Head Start Timeline View

In addition, I cleaned up the pre-processing scripts that do all the clustering, ordination and naming. The only thing that you need to get started is a list of publications and their metadata as well as a file containing similarity values between papers. Originally, the similarity values were based on readership co-occurrence, but there are many other measures that you can use (e.g. the number of keywords or tags that two papers have in common).

So without further ado, here is the link to the Github repository. Any questions or comments, please send them to me or leave a comment below.

“It’s not only peer-reviewed, it’s reproducible!”

- October 18, 2013 in Panton Fellowships, Panton Principles, Reproducibility

Peer review is one of the oldest and most respected instruments of quality control in science and research. Peer review means that a paper is evaluated by a number of experts on the topic of the article (the peers). The criteria may vary, but most of the time they include methodological and technical soundness, scientific relevance, and presentation.

“Peer-reviewed” is a widely accepted sign of quality of a scientific paper. Peer review has its problems, but you won’t find many researchers that favour a non peer-reviewed paper over a peer-reviewed one. As a result, if you want your paper to be scientifically acknowledged, you most likely have to submit it to a peer-reviewed journal.

Even though it will take more time and effort to get it published than in a non peer-reviewed publication outlet.

Peer review helps to weed out bad science and pseudo-science, but it also has serious limitations. One of these limitations is that the primary data and other supplementary material such as documentation source code are usually not available. The results of the paper are thus not reproducible. When I review such a paper, I usually have to trust the authors on a number of issues: that they have described the process of achieving the results as accurate as possible, that they have not left out any crucial pre-processing steps and so on. When I suspect a certain bias in a survey for example, I can only note that in the review, but I cannot test for that bias in the data myself. When the results of an experiment seem to be too good to be true, I cannot inspect the data pre-processing to see if the authors left out any important steps.

As a result, later efforts in reproducing research results can lead to devastating outcomes. Wang et al. (2010) for example found that they could not reproduce almost all of the literature on a certain topic in computer science.

“Reproducible”: a new quality criterion

Needless to say this is not a very desirable state. Therefore, I argue that we should start promoting a new quality criterion: “reproducible”. Reproducible means that the results achieved in the paper can be reproduced by anyone because all of the necessary supplementary resources have been openly provided along with the paper.

It is easy to see why a peer-reviewed and reproducible paper is of higher quality than just a peer-reviewed one. You do not have to take the researchers’ word of how they calculated their results – you can reconstruct them yourself. As a welcome side-effect, this would make more datasets and source code openly available. Thus, we could start building on each others’ work and aggregate data from different sources to gain new insights.

In my opinion, reproducible papers could be published alongside non-reproducible papers, just like peer-reviewed articles are usually published alongside editorials, letters, and other non peer-reviewed content. I would think, however, that over time, reproducible would become the overall quality standard of choice – just like peer-reviewed is the preferred standard right now. To help this process, journals and conferences could designate a certain share of their space to reproducible papers. I would imagine that they would not have to do that for too long though. Researchers will aim for a higher quality standard, even if it takes more time and effort.

I do not claim that reproducibility solves all of the problems that we see in science and research right now. For example, it will still be possible to manipulate the data to a certain degree. I do, however, believe that reproducibility as an additional quality criterion would be an important step for open and reproducible science and research.

So that you can say to your colleague one day: “Let’s go with the method described in this paper. It’s not only peer-reviewed, it’s reproducible!”