You are browsing the archive for altmetrics.

All metrics are wrong, but some are useful

- May 31, 2014 in Panton Fellowships

barometer2

by Leo Reynolds

Altmetrics, web-based metrics for measuring research output, have recently received a lot of attention. Started only in 2010, altmetrics have become a phenomenon both in the scientific community and in the publishing world. This year alone, EBSCO acquired PLUM Analytics, Springer included Altmetric info into SpringerLink, and Scopus augmented articles with Mendeley readership statistics.

Altmetrics have a lot of potential. They are usually earlier available than citation-based metrics, allowing for an early evaluation of articles. With altmetrics, it also becomes possible to assess the many outcomes of research besides just the paper – meaning data, source code, presentations, blog posts etc.

One of the problems with the recent hype surrounding altmetrics, however, is that it leads some people to believe that altmetrics are somehow intrinsically better than citation-based metrics. They are, of course, not. In fact, if we just replace the impact factor with the some aggregate of altmetrics then we have gained nothing. Let me explain why.

The problem with metrics for evaluation

You might know this famous quote:

“All models are wrong, but some are useful” (George Box)

It refers to the fact that all models are a simplified view of the world. In order to be able to generalize phenomena, we must leave out some of the details. Thus, we can never explain a phenomenon in full with a model, but we might be able to explain the main characteristics of many phenomena that fall in the same category. The models that can do that are the useful ones.

Example of a scientific model, explaining atmospheric composition based on chemical process and transport processes.  Source: Strategic Plan for the U.S. Climate Change Science Program (Image by  Phillipe Rekacewicz)

Example of a scientific model, explaining atmospheric composition based on chemical process and transport processes. Source: Strategic Plan for the U.S. Climate Change Science Program (Image by Phillipe Rekacewicz)

The very same can be said about metrics – with the grave addition that metrics have a lot less explanatory power than a model. Metrics might tell you something about the world in a quantified way, but for the how and why we need models and theories. Matters become even worse when we are talking about metrics that are generated in the social world rather than the physical world. Humans are notoriously unreliable and it is hard to pinpoint the motives behind their actions. A paper may be cited for example to confirm or refute a result, or simply to acknowledge it. A paper may be tweeted to showcase good or to condemn bad research.

In addtion, all of these measures are susceptible to gaming. According to ImpactStory, an article with just 54 Mendeley readers is already in the 94-99 percentile (thanks to Juan Gorraiz for the example). Getting your paper in the top ranks is therefore easy. And even indicators like downloads or views that go into the hundreds of thousands can probably be easily gamed with a simple script deployed on a couple of university servers around the country. This makes the old citation cartel look pretty labor-intensive, doesn’t it?

Why we still need metrics and how we can better utilize them

Don’t get me wrong: I do not think that we can come by without metrics. Science is still growing exponentially, and therefore we cannot rely on qualitative evaluation alone. There are just too many papers published, too many applications for tenure track positions submitted and too many journals and conferences launched each day. In order to address the concerns raised above, however, we need to get away from a single number determining the worth of an article, a publication, or a researcher.

One way to do this would be a more sophisticated evaluation system that is based on many different metrics, and that gives context to these metrics. This would require that we work towards getting a better understanding of how and why measures are generated and how they relate to each other. In analogy to the models, we have to find those numbers that give us a good picture of the many facets of a paper – the useful ones.

As I have argued before, visualization would be a good way to represent the different dimensions of a paper and its context. Furthermore, the way the metrics are generated must be open and transparent to make gaming of the system more difficult, and to expose the biases that are inherent in humanly created data. Last, and probably most crucial, we, the researchers and the research evaluators must critically review the metrics that are served to us.

Altmetrics do not only give us new tools for evaluation, their introduction also presents us with the opportunity to revisit academic evaluation as such – let’s seize this opportunity!

New version of open source visualization Head Start released

- February 24, 2014 in Panton Fellowships

In July last year, I released the first version of a knowledge domain visualization called Head Start. Head Start is intended for scholars who want to get an overview of a research field. They could be young PhDs getting into a new field, or established scholars who venture into a neighboring field. The idea is that you can see the main areas and papers in a field at a glance without having to do weeks of searching and reading.

Interface of Head Start

Interface of Head Start

You can find an application for the field of educational technology on Mendeley Labs. Papers are grouped by research area, and you can zoom into each area to see the individual papers’ metadata and a preview (or the full text in case of open access publications). The closer two areas are, the more related they are subject-wise. The prototye is based on readership data from the online reference management system Mendeley. The idea is that the more often two papers are read together, the closer they are subject-wise. More information on this approach can be found in my dissertation (see chapter 5), or if you like it a bit shorter, in this paper and in this paper.

Head Start is a web application built with D3.js. The first version worked very well in terms of user interaction, but it was a nightmare to extend and maintain. Luckily, Philipp Weißensteiner, a student at Graz University of Technology became interested in the project. Philipp worked on the visualization as part of his bachelor’s thesis at the Know-Center. Not only did he modularize the source code, he also introduced Javascript Finite State Machine that lets you easily describe different states of the visualization. To setup a new instance of Head Start is now only a matter of a couple of lines. Philipp developed a cool proof of concept for his approach: a visualization that shows the evolution of a research field over time using small multiples. You can find his excellent bachelor’s thesis in the repository (German).

Head Start Timeline View

Head Start Timeline View

In addition, I cleaned up the pre-processing scripts that do all the clustering, ordination and naming. The only thing that you need to get started is a list of publications and their metadata as well as a file containing similarity values between papers. Originally, the similarity values were based on readership co-occurrence, but there are many other measures that you can use (e.g. the number of keywords or tags that two papers have in common).

So without further ado, here is the link to the Github repository. Any questions or comments, please send them to me or leave a comment below.

First Quarterly Report on my Panton Fellowship Activities

- January 15, 2014 in Panton Fellowships

by jakeandlindsay

by jakeandlindsay

I am now a little more than three months into my Panton Fellowship. This means it is time to give an overview of my activities so far. As outlined in my initial blog post, there are two main objectives of my fellowship: working on open and transparent altmetrics, and the promotion of open science.

Regarding the promotion of open science, I would like to highlight two local activities first. Since September, I have contributed to a monthly sum-up of open science activities in the German-speaking world and beyond in order to make these activities and more visible within the local community. You can find the sum-ups (only available in German) here: September, October, November, December. At this point, I would like to add a big shout out to the other contributors: Christopher Kittel, Stefan Kasberger, and Matthias Fromm.

I was also a panelist at the kick-off event of the openscienceASAP platform in Graz, entitled “The Changing Face of Science: Is Open Science the Future?”. openscienceASAP promotes open science as a practice, and this event was intended as a forum for interested students, researchers, and the general public. It ended up to be a very lively discussion that covered a lot of ground including open access, open peer review, altmetrics, open data, and so forth.

Regarding wider community work, I have started to develop an open data policy for the International Journal of Technology Enhanced Learning. IJTEL will become one of the first journals in the field that has such a policy, and hopefully this will inspire others to follow suit. Furthermore, in my role as an advocate for reproducibility I wrote a blog post on why reproducibility should become a quality criterion in science. The post sparked a lot of discussion, and was widely linked and tweeted.

The fellowship also enabled me to attend several other events related to open science: in September, I went to OKCon in Geneva, and in November I attended SpotOn in London. Furthermore, I attended a meeting of the Leibniz research network “Science 2.0” in Berlin. These events were a great experience for me. I learned a lot, and I met many new and wonderful people who are passionate about open science.

I also used these events to discuss my second objective: the need for open and transparent altmetrics. Altmetrics will be the main objective for the second quarter of my fellowship. I will be looking at different altmetrics sources and how they can be used for aggregation and visualization. To kickstart the activities, I have outlined my thoughts on this topic in this blog post. Furthermore, I helped to organize a OKFN Open Science Meetup in Vienna on the topic. I also gave an introduction to altmetrics at this occasion; the slides can be found here.

The first three months of my fellowship were a busy yet wonderful time. Besides the activities above, I finally finished my PhD on altmetrics-based visualization. Now I am off for a three-month visit to the Personalized Adaptive Web Systems Lab of University of Pittsburgh. I cannot wait to see what the second quarter has in store for me! As always, please get in touch if you have any questions or comments, or in case you want to collaborate on one or the other project.

Open and transparent altmetrics for discovery

- December 9, 2013 in Panton Fellowships, Research, Tools

6795008004_8046829553

by AG Cann

Altmetrics are a hot topic in scientific community right now. Classic citation-based indicators such as the impact factor are amended by alternative metrics generated from online platforms. Usage statistics (downloads, readership) are often employed, but links, likes and shares on the web and in social media are considered as well. The altmetrics promise, as laid out in the excellent manifesto, is that they assess impact quicker and on a broader scale.

The main focus of altmetrics at the moment is evaluation of scientific output. Examples are the article-level metrics in PLOS journals, and the Altmetric donut. ImpactStory has a slightly different focus, as it aims to evaluate the oeuvre of an author rather than an individual paper.

This is all good and well, but in my opinion, altmetrics have a huge potential for discovery that goes beyond rankings of top papers and researchers. A potential that is largely untapped so far.

How so? To answer this question, it is helpful to shed a little light on the history of citation indices.

Pathways through science

In 1955, Eugene Garfield created the Science Citation Index (SCI) which later went on to become the Web of Knowledge. His initial idea – next to measuring impact – was to record citations in a large index to create pathways through science. Thus one can link papers that are not linked by shared keywords. It makes a lot of sense: you can talk about the same thing using totally different terminology, especially when you are not in the same field. Furthermore, terminology has proven to be very fluent even in the same domain (Leydesdorff 1997). In 1973, Small and Marshakova realized – independently from each other – that co-citation is a measure of subject similarity and therefore can be used to map a scientific field.

Due to the fact that citations are considerably delayed, however, co-citation maps are often a look into the past and not a timely overview of a scientific field.

Altmetrics for discovery

In come altmetrics. Similarly to citations, they can create pathways through science. After all, a citation is nothing else but a link to another paper. With altmetrics, it is not so much which papers are often referenced together, but rather which papers are often accessed, read, or linked together. The main advantage of altmetrics, as with impact, is that they are much earlier available.

clickstream_map

Bollen et al. (2009): Clickstream Data Yields High-Resolution Maps of Science. PLOS One. DOI: 10.1371/journal.pone.0004803.

One of the efforts in this direction is the work of Bollen et al. (2009) on click-streams. Using the sequences of clicks to different journals, they create a map of science (see above).

In my PhD, I looked at the potential of readership statistics for knowledge domain visualizations. It turns out that co-readership is a good indicator for subject similarity. This allowed me to visualize the field of educational technology based on Mendeley readership data (see below). You can find the web visualization called Head Start here and the code here (username: anonymous, leave password blank).

headstart

http://labs.mendeley.com/headstart

Why we need open and transparent altmetrics

The evaluation of Head Start showed that the overview is indeed more timely than maps based on citations. It, however, also provided further evidence that altmetrics are prone to sample biases. In the visualization of educational technology, the computer science driven areas such as adaptive hypermedia are largely missing. Bollen and Van de Sompel (2008) reported the same problem when they compared rankings based on usage data to rankings based on the impact factor.

It is therefore important that altmetrics are transparent and reproducible, and that the underlying data is openly available. This is the only way to ensure that all possible biases can be understood.

As part of my Panton Fellowship, I will try to find datasets that satisfy these criteria. There are several examples of open bibliometric data, such as the Mendeley API, and figshare API that have adopted CC BY, but most of the usage data is not available publicly or cannot be redistributed. In my fellowship, I want to evaluate the goodness of fit of different open altmetrics data. Furthermore, I plan to create more knowledge domain visualizations such as the one above.

So if you know any good datasets please leave a comment below. Of course any other comments on the idea are much appreciated as well.