You are browsing the archive for Tools.

Open Training for Open Science

- December 21, 2014 in Featured, Reproducibility, Research, Tools

This is part of series of blog posts highlighting focus points for the Open Science Working Group in 2015. These draw on activities started in the community during 2014 and suggestions from the Working Group Advisory Board.

By opensourceway on Flickr under CC-BY-SA 2.0

By opensourceway on Flickr under CC-BY-SA 2.0

The Open Science Working group have long supported training for open science and early introduction of the principles of open and reproducible research in higher education (if not before!). This area was a focus in 2013-4 and grows in importance as we enter 2015 with the level of interest in openness in science increasing at a rapid rate. This post attempts to provide examples of training initiatives in which members of the working group have been involved and particular areas where work is lacking.

  1. Openness in higher education
  2. Strategies for training in open science
  3. The Open Science Training Initiative (OSTI)
  4. Developing Open Science Training Curricula
  5. Incorporating Open Science Training into Current Courses
  6. Conclusion
  7. Getting Involved in Open Science Training

Openness in higher education

Openness has the potential to radically alter the higher education experience. For instance, Joss Winn and Mike Neary posit that democratisation of participation and access could allow a reconstruction of the student experience in higher eduction to achieve this social relevance, they propose:

“To reconstruct the student as producer: undergraduate students working in collaboration with academics to create work of social importance that is full of academic content and value, while at the same time reinvigorating the university beyond the logic of market economics.”[1]

Openness focuses on sharing and collaboration for public good, at odds with the often competitive ethos in research and education. This involves more than simply implementing particular pedagogies or publishing open access articles – as Peters and Britez state bluntly in the first sentence of their book on open education:

“Open education involves a commitment to openness and is therefore inevitably a political and social project.” [2]

This could equally apply to open science. Openness is a cultural shift that is facilitated but not driven by legal and technical tools. In open education, for instance, open pedagogy makes use of now abundant openly licensed content but also places an emphasis on the social network of participants and the learner’s connections within this, emphasising that opening up the social institution of higher education is the true transformation. In open science, a lot of training focuses on the ability to manage and share research data, understand licensing and use new digital tools including training in coding and software engineering. However, understanding the social and cultural environment in which research takes place and how openness could impact that is arguably even more fundamental.

This section will focus on three topics around open science training, offering relevant linkages to educational literature and suggestions for teaching design:

  1. Use of open data and other open research objects in higher education.
  2. Use of open science approaches for research-based learning.
  3. Strategies for training in open science.

Strategies for training in open science

As openness is a culture and mindset, socio-cultural approach to learning and the construction of appropriate learning environments is essential. While the Winn and Neary [1] focus on the student as producer, Sophie Kay [3] argues that this can be detrimental as it neglects the role of students as research consumers which in turn neglects their ability to produce research outputs which are easily understood and reuseable.

Training in evolving methods of scholarly communication is imperative because there are major policy shifts towards a requirement for open research outputs at both the funder and learned society levels in the UK, EU and US. This is in addition to a growing grassroots movement in scientific communities, accelerated by the enormous shifts in research practice and wider culture brought about by pervasive use of the internet and digital technologies. The current generation of doctoral candidates are the first generation of `digital natives’, those who have grown up with the world wide web, where information is expected to be available on demand and ‘prosumers’ who consume media and information as well as producing their own via social media sites, are the norm. This norm is not reflected in most current scientific practice, where knowledge dissemination is still largely based on a journal system founded in the 1600s, albeit now in digital format. Current evidence suggests that students are not prepared for change, for example a major study of 17,000 UK graduate students [4] revealed that students:

  • hold many misconceptions about open access publishing, copyright and intellectual property rights;
  • are slow to utilise the latest technology and tools in their research work, despite being proficient in IT;
  • influenced by the methods, practices and views of their immediate peers and colleagues.

While pre-doctoral training is just as important, the majority of open science training initiatives documented thus far have aimed at the early career research stage, including doctoral students.

The Open Science Training Initiative (OSTI)

Photo courtesy of Sophie Kay, licensed under CC-BY.

OSTI photo courtesy of Sophie Kay, licensed under CC-BY.

Open Knowledge Panton Fellow Sophie Kay developed an Open Science Training Initiative (OSTI) [3], trialled in the Life Science Interface Doctoral Training Centre at the University of Oxford, which employs `rotation based learning’ (RBL) to cement the role of students as both producers and consumers of research through learning activities which promote the communication of coherent research stories that maximise reproducibility and usefulness. The content involves a series of mini-lectures around concepts, tools and skills required to practice openly, including an awareness of intellectual property rights and licensing, digital tools and services for collaboration, storage and dissemination, scholaraly communication and broader cultural contexts of open science.

The novel pedagogical approach employed was the creation of groups during an initiator phase where each group reproduces and documents a scientific paper, ensuring that outputs are in appropriate formats and properly licensed. Next the successor phase sees the reproduced work being rotated to another group who must again validate and build upon it in the manner of a novel research project, with daily short meetings with instructors to address any major issues. No intergroup communication is allowed during either phase, meaning that deficiencies in documentation and sticking points become obvious and hopefully leads to greater awareness among students of the adequacy of their future documentation. The pilot course involved 43 students and had a subject-specific focus on computational biology. Feedback was excellent with students feeling that they had learnt more about scientific working practises and indicating they were highly likely to incorporate ideas introduced during the course into their own practice.

This course design offers great scope for inter-institutional working and as it uses OERs the same training can be delivered in several locations but remains adaptable to local needs. RBL would be more challenging to mirror in wet labs but could be adapted for these settings and anyone is encouraged to remix and run their own instance. Sophie is especially keen to see the materials translated into further languages.

Developing Open Science Training Curricula

OSTI is one of the first courses to specifically address open science training but is likely the first of many as funding is becoming available from the European Commission and other organisations specifically aimed at developing open access and open science resources and pedagogies. Some of the key consideration for teaching design in this space are:

  1. How to address socio-cultural aspects in addition to imparting knowledge about legal and technical tools or subject-specific content and skills training.
  2. The current attitudes and perceptions of students towards intellectual property and the use of digital technologies and how this will impact their learning.
  3. The fast pace of change in policy requirements and researcher attitudes to aspects of open science.
  4. Additional time and resources required to run additional courses vs amelioration of existing activities.
Open science curriculum map at MozFest 2014.  Photo by Jenny Molloy, dedicated to the public domain via a CCZero waiver.

Open science curriculum map at MozFest 2014. Photo by Jenny Molloy, dedicated to the public domain via a CCZero waiver.

There are numerous one-off training events happening around the world, for instance the series of events funded by the FOSTER EU programme, which includes many workshops on open science. There are also informal trainings through organisations such as the Open Science working group local groups. Open science principles are incorporated into certain domain-specific conferences or skill-specific programmes like Software Carpentry Workshops, which have a solid focus on reproducibility and openness alongside teaching software engineering skills to researchers.

There are no established programmes and limited examples of open science principles incorporated into undergraduate or graduate curricula across an entire module or course. Historically, there have been experiments with Open Notebook Science, for instance Jean-Claude Bradley’s work used undergraduates to crowdsource solubility data for chemical compounds. Anna Croft from Bangor University presented her experiences encouraging chemistry undergraduates to use open notebooks at OKCon 2011 and found that competition between students was a barrier to uptake. At a graduate level, Brian Nosek has taught research methods courses incorporating principles of openness and reproducibility (Syllabus) and a course on improving research (Syllabus). The Centre for Open Science headed by Nosek also has a Collaborative Replications and Education Project (CREP) which is an excellent embodiment of the student as producer model and incorporates many aspects of open and reproducible science through encouraging students to replicate studies. More on this later!

It is clear that curricula, teaching resources and ideas would be useful to open science instructors and trainer at this stage. Billy Meinke and Fabiana Kubke helpfully delved into a skills-based curriculum in more depth during Mozilla Festival 2014 with their mapping session. Bill Mills of Mozilla Science Lab recently published a blog post on a similar theme and has started a pad to collate further information on current training programmes for open science. In the US, NCAES ran a workshop developing a curriculum for reproducible science followed by a workshop on Open Science for Synthesis .

NESCent ran a curriculum building workshop in Dec 2014 (see wiki). Several participants in the workshop have taught their own courses on Tools for Reproducible Research (Karl Broman) or reproducibility in statistics courses (Jenny Bryan). This workshop was heavily weighted to computational and statistical research and favoured R as the tool of choice. Interestingly their curriculum looked very different to the MozFest map, which goes to show the breadth of perspectives on open science within various communities of researchers!

All of these are excellent starts to the conversation and you should contribute where possible! There is a strong focus on data-rich, computational science so work remains to rethink training for the wet lab sciences. Of the branches of skills identified by Billy and Fabiana, only two of seven relate directly to computational skills, suggesting that there is plenty of work to be done! For further ideas and inspiration, the following section details some ways in which the skills can be further integrated into the curriculum through existing teaching activities.

Skills map for Reproducible, Open and Collaborative Science. Billy Meinke and Fabiana Kubke's session at MozFest 2014.

Skills map for Reproducible, Open and Collaborative Science. Billy Meinke and Fabiana Kubke’s session at MozFest 2014.

Incorporating Open Science Training into Current Courses

Using the open literature to teach about reproducibility

Data and software is increasingly published alongside papers, ostensibly enabling reproduction of research. When students try to reanalyse, replicate or reproduce research as a teaching activity they are developing and using skills in statistical analysis, programming and more in addition to gaining exposure to the primary literature. As much published science is not reproducible, limitations of research documentation and experimental design or analysis techniques may become more obvious, providing a useful experiential lesson.

There is public benefit to this type of analysis. Firstly, whether works are reproducible or not is increasingly of interest particularly to computational research and various standards and marks of reproducibility have been proposed but the literature is vast and there is no mechanism widely under consideration for systematic retrospective verification and demarcation of reproduciblity. Performing this using thousands of students in the relevant discipline could rapidly crowdsource the desired information while fitting easily into standard components of current curricula and offering a valid and useful learning experience.

The effect of `many eyes’ engaging in post-publication peer review and being trained in reviewing may also throw up substantive errors beyond a lack of information or technical barriers to reproduction. The most high profile example of this is the discovery by graduate student Thomas Herndon of serious flaws in a prominent economics paper when he tried to replicate its findings [5,6]. These included coding errors, selective exclusion of data and unconventional weighting of statistics, meaning that a result which was highly cited by advocates of economic austerity measures and had clear potential to influence fiscal policy was in fact spurious. This case study provides a fantastic example of the need for open data and the social and academic value of reanalysis by students, with the support of faculty.

This possibility has not been picked up in many disciplines but the aforementioned CREP project aims to perform just such a crowd-sourced analysis and asks instructors to consider what might be possible through student replication. Grahe et al., suggest that:

“Each year, thousands of undergraduate projects are completed as part of the educational experience…these projects could meet the needs of recent calls for increased replications of psychological studies while simultaneously benefiting the student researchers, their instructors, and the field in general.” [7]

Frank and Saxe [8] support this promise, reporting that they found teaching replication to be enjoyable for staff and students and an excellent vehicle for educating about the importance of reporting standards, and the value of openness. Both publications suggest approaches to achieving this in the classroom and are well worth reading for further consideration and discussion about the idea.

Reanalysing open data

One step from reproduction of the original results is the ability to play with data and code. Reanalysis using different models or varying parameters to shift the focus of the analysis can be very useful, with recognition of the limitations of experimental design and the aims of the original work. This leads us to the real potential for novel research using open datasets. Some fields lend themselves to this more than others. For example, more than 50% of public health masters projects across three courses examined by Feldman et al. [9] used secondary data for their analyses rather than acquiring expensive and often long-term primary datasets. Analysis of large and complex public health data is a vital graduate competency, therefore the opportunity to grapple with the issues and complexities of real data rather than a carefully selected or contrived training set is vital.

McAuley et al. [10] suggest that the potential to generate linked data e.g. interconnecting social data, health statistics and travel information, is the real power of open data and can produce highly engaging educational experiences. Moving beyond educational value, Feldman et al. [9] argue that open data use in higher education research projects allows for a more rapid translation of science to practise. However, this can only be true if that research is itself shared with the wider community of practise, as advocated by Lompardi [11]. This can be accomplished through the canonical scientific publishing track or using web tools and services such as the figshare or CKAN open data repositories, code sharing sites and wikis or blogs to share discoveries.

In order to use these digital tools that form the bedrock of many open science projects and are slowly becoming fully integrated into scholarly communication systems, technological skills and understanding of the process of knowledge production and disemmination in the sciences is required. Students should be able to contextualise these resources within the scientific process to prepare them for a future in a research culture that is being rapidly altered by digital technologies. All of these topics, including the specific tools mentioned above, are covered by the ROCS skills mapping from MozFest, demonstrating that the same requirements are coming up repeatedly and independently.

Use of open science approaches for research-based learning

There are several powerful arguments as to why engaging students in research-based activities leads to higher level and higher quality learning in higher education and the Boyer Commission on Educating Undergraduates in the Research University called for research based learning to become the standard, stating a desire:

“…to turn the prevailing undergraduate culture of receivers into a culture of inquirers, a culture in which faculty, graduate students, and undergraduates share an adventure of discovery.”

The previous section emphasised the potential role of open content, namely papers, data and code in research-based learning. In addition, the growing number of research projects open to participation by all – including those designated as citizen science – can offer opportunities to engage in research that scales and contributes more usefully to science than small research projects that may be undertaken in a typical institution as part of an undergraduate course. These open science activities offer options for both wet and dry lab based activities in place or in addition to standard practical labs and field courses.

The idea of collaborative projects between institutions and even globally is not new, involvement in FOSS projects for computational subjects has long been recognised as an excellent opportunity to get experience of collaborative coding in large projects with real, often complex code bases and a `world-size laboratory’ [12]. In wet lab research there are examples of collaborative lab projects between institutions which have been found to cut costs and resources as well as increasing the sample size of experiments performed to give publishable data [13]. Openness offers scaling opportunities to inter-institutional projects which might otherwise not exist by increasing their visibility and removing barriers to further collaborative partners joining.

Tweet from @O_S_M requesting assistance synthesising molecules.

Tweet from @O_S_M requesting assistance synthesising molecules.

There are several open and citizen science projects which may offer particular scope for research-based learning. One could be the use of ecology field trips and practicals to contribute to the surveys conducted by organisations such as the UK Biological Records Centre, thus providing useful data contributions and access to a wider but directly relevant dataset for students to analyse. NutNet is a global research cooperative which sets up node sites to collect ecosystem dynamics data using standard protocols for comparison across sites globally, as this is a longitudinal study with most measurements being taken only a couple of times a year it offers good scope for practical labs. On a more ad hoc basis, projects such as Open Source Malaria offer many project and contribution opportunities e.g. a request to help make molecules on their wishlist and a GitHub hosted to do list. One way of incorporating these into curricula are team challenges in a similar vein to the iGEM synthetic biology project, which involves teams of undergraduates making bacteria with novel capabilities and contributes the DNA modules engineered to a public database of parts known as BioBricks.

In conclusion, open and citizen science projects which utilise the internet to bring together networks of people to contribute to live projects could be incorporated into inquiry-based learning in higher education to the benefit of both students and the chosen projects, allowing students to contribute truly scientifically and socially important data in the `student as producer’ model while maintaining the documented benefits of research-based pedagogies. This ranges from controlled contributions to practice particular skills through discovery-oriented tasks and challenges such as iGEM, allowing students to generate research questions independently.

There are significant challenges in implementing these types of research-based activities, many of which are true of `non-open’ projects. For instance, there are considerations around mechanisms of participation and sharing processes and outputs. Assessment becomes more challenging as students are collaborating rather than providing individual evidence of attainment. As work is done in the open, provenance and sharing of ideas requires tracking.

Conclusion

This post has introduced some ideas for teaching open science focusing on the student as both a producer and consumer of knowledge. The majority of suggestions have centred around inquiry-based learning as this brings students closer to research practices and allows social and cultural aspects of science and research to be embedded in learning experiences.

Explicitly articulating the learning aims and values that are driving the teaching design would be useful to enable students to critique them and arrive at their own conclusions about whether they agree with openness as a default condition. There is currently little systematic evidence for the proposed benefits of open science, partly because it is not widely practised in many disciplines and also as a result of the difficulty of designing research to show direct causality. Therefore, using evidence-based teaching practices that attempt to train students as scientists and critical thinkers without exposing the underlying principles of why and how they’re being taught would not be in the spirit of the exercise.

Support for increased openness and a belief that it will lead to better science is growing, so the response of the next generation of scientists and their decision about whether to incorporate these practices into their work has great implications for the future research cultures and communities. At the very least, exposure to these ideas during under- and postgraduate training will enable students to be aware of them during their research careers and make more informed decisions about their practises, values and aims as a researcher. There are exciting times ahead in science teaching!

If you’ve found this interesting, please get involved with a growing number of like-minded people via the pointers below!

Getting Involved in Open Science Training

More projects people could get involved with? Add them to the comments and the post will be updated.

References

  1. Neary, M., & Winn, J. (2009). The student as producer: reinventing the student experience in higher education.
  2. Peters, M. A., & Britez, R. G. (Eds.). (2008). Open education and education for openness. Sense Publishers.
  3. For a peer-reviewed paper on the OSTI initiative, see Kershaw, S.K. (2013). Hybridised Open Educational Resources and Rotation Based Learning. Open Education 2030. JRC−IPTS Vision Papers. Part III: Higher Education (pp. 140-144). Link to the paper in Academia.edu
  4. Carpenter, J., Wetheridge, L., Smith, N., Goodman, M., & Struijvé, O. (2010). Researchers of Tomorrow: A Three Year (BL/JISC) Study Tracking the Research Behaviour of’generation Y’Doctoral Students: Annual Report 2009-2010. Education for Change.
  5. Herndon, T., Ash, M., & Pollin, R. (2014). Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff. Cambridge journal of economics, 38(2), 257-279.

  6. Roose, Kevin. (2013). Meet the 28-Year-Old Grad Student Who Just Shook the Global Austerity Movement}. New York Magazine. Available from http://nymag.com/daily/intelligencer/2013/04/grad-student-who-shook-global-austerity-movement.html. Accessed 20 Dec 2014.
  7. Grahe, J. E., Reifman, A., Hermann, A. D., Walker, M., Oleson, K. C., Nario-Redmond, M., & Wiebe, R. P. (2012). Harnessing the undiscovered resource of student research projects. Perspectives on Psychological Science, 7(6), 605-607.
  8. Frank, M. C., & Saxe, R. (2012). Teaching replication. Perspectives on Psychological Science, 7(6), 600-604.
  9. Feldman, L., Patel, D., Ortmann, L., Robinson, K., & Popovic, T. (2012). Educating for the future: another important benefit of data sharing. The Lancet, 379(9829), 1877-1878.
  10. McAuley, D., Rahemtulla, H., Goulding, J., & Souch, C. (2012). 3.3 How Open Data, data literacy and Linked Data will revolutionise higher education.
  11. Lombardi, M. M. (2007). Approaches that work: How authentic learning is transforming higher education. EDUCAUSE Learning Initiative (ELI) Paper, 5.
  12. O’Hara, K. J., & Kay, J. S. (2003). Open source software and computer science education. Journal of Computing Sciences in Colleges, 18(3), 1-7.
  13. Yates, J. R., Curtis, N., & Ramus, S. J. (2006). Collaborative research in teaching: collaboration between laboratory courses at neighboring institutions. Journal of Undergraduate Neuroscience Education, 5(1), A14.

Licensing

Text is licensed under the Creative Commons CC0 1.0 Universal waiver. To the extent possible under law, the author(s) have dedicated all copyright and related and neighbouring rights to this text to the public domain worldwide.

Open Books Image: by opensourceway on Flickr under CC-BY-SA 2.0

SciDataCon2014 Open Science Roundup

- November 18, 2014 in External Meetings, Featured, Research, Tools

SciDataCon 2014 was the first ever International Conference on Data Sharing and Integration for Global Sustainability jointly organised by CODATA and World Data Systems, two organisations that form part of the International Council for Science. The meeting was held 2-5 November in New Delhi and I had the pleasure of staying in the peaceful, green campus of IIT-Delhi within walking distance of SciDataCon2014 at the adjacent and equally pleasant Jawaharlal Nehru University (JNU).

It was a jam-packed week but I’ve tried to pick out some of my personal highlights. Puneet Kishor has also blogged on the meeting and there was an active Twitter feed for #SciDataCon2014 .

Text and Data Mining Workshop
Open Data Initiatives
Data from the People for the People
Summary
Bonus slide decks

Photo by Puneet Kishor  published under CC0 Public Domain Dedication

Photo by Puneet Kishor published under CC0 Public Domain Dedication

Text and Data Mining Workshop

On Sunday 2 November I ran a workshop with Puneet Kishor of Creative Commons as a joint venture with Open Knowledge and ContentMine. Armed with highlighters, post-it notes and USB-stick virtual machines, we led a small but dedicated and enthusiastic group of immunologists, bioinformaticians, plant genomics researchers and seabed resource experts through the basics of content mining.

Photo by Puneet Kishor  published under CC0 Public Domain Dedication

Photo by Puneet Kishor published under CC0 Public Domain Dedication

We covered what it means, when it is legal to content mine and more broadly some of the policy and legal frameworks which impact access to and rights of reuse for the scientific literature. We hand annotated entity types in two papers about lion evolution and Aspergillus fungi. This aimed to get people thinking about patterns and how to program entity recognition – what instructions does a computer require to recognise what our brain categorises easily? Swapping over the papers showed 80-90% inter-participant agreement in entity mark-up suggesting a reasonable precision and recall rate for our content mining humans!

Photo by Puneet Kishor published under CC0 Public Domain Dedication

Photo by Puneet Kishor published under CC0 Public Domain Dedication

Everybody managed to scrape multiple Open Access publications and extract species names, we also discussed potential collaborations and had a virtual visitor from afar, Peter Murray-Rust. The overwhelming feeling in the room was once of dismay at the restrictions on reuse of academic content but optimism about the potential uses of content mining – we hope that an excellent collaboration opportunity around phytochemistry will come to fruition!

Legal Implications of Text and Data Mining (TDM)

Open Data Initiatives


Several sessions over the conference highlighted how far we still have to go in terms of data sharing and particularly the challenge of gaining political will required for data sharing for global sustainability. Waltraut Ritter, a member of the active Open Knowledge local group Open Data Hong Kong, presented a paper co-authored with Scott Edmunds and others, making the case to policy makers that open data can support science and innovation. There is no guidance from the Hong Kong University Grants Committee on dissemination of research data resulting from its 7.5 billion HKD annual funding pool. Data sharing was explicitly flagged as low priority in 2011 and on enquiry in 2014 Open Data HK were informed that this assessment had not changed. Arguments to appeal to policy makers are clearly required in these situations and Waltraut expanded on a few during the talk.

Exploring the complexities of sharing data for public health research, Sanna Meherally reported on a qualitative study examining the ethical and practical background to potential research data sharing, involving five sites in Asia and Africa, and focusing on stakeholder perspectives. A key takeaway message was the importance of considering cultural barriers to implementation of funder data policies. Chief concerns raised in interviews were confidentiality, the potential for data collection efforts to be underplayed and the need to give something back to research participants. That the latter point was raised by so many researchers interviewed is encouraging given the title of the next day’s session ‘Data from the people for the people’, which was another focus of SciDataCon.

Data from the People for the People – encouraging Re-use and Collaboration

This double session focused on citizen science projects around topics related to sustainability, including biodiversity and climate change. Norbert Schmidt introduced projects in the Netherlands to monitor air quality while Raman Kumar from the Nature Conservation Foundation introduced a range of bird and plant ecology citizen science projects in India such as eBird, MigrantWatch and SeasonWatch. You can find the full session list here.

Cumulative hours of birding as of Sep 214 through the eBird India citizen science initiative

Most questions raised surrounded the validation of data quality from citizen scientists, which has been addressed at length by several projects. Later presentations and discussions moved to some very pressing matter in participatory science – how to build and retain and community of contributors and how to manage outputs in a way that is accessible to and benefits contributors, a similar point to that raised by Sanna Meherally. Retention of volunteers is a particular issue in longitudinal studies in ecology, as data is required for the same locality over multiple years so repeat volunteering is essential.

Tyng-Ruey Chuang tackled some of these issues in his talk on ‘Arrangements for Data Sharing and Reuse in Citizen Science Projects‘. He asked projects to compare themselves to Wikipedia in terms of openness, participation and tools. For instance, does your project retain or strip metadata from contributed images? Tyng-Ruey also emphasised informed participation – clearly state if citizen contributions are prima facie uncopyrightable or ask agreement for open licensing. This chimed with earlier points by Ryosuke Shibasaki about the need for citizen ownership of contributed data and agency to make informed decisions about its use.

The talk ended with a call to action, as the Open Definition was practically quoted and Tyng-Ruey called for raw data, now! He’s in good company at Open Knowledge!

Arrangements for Data Sharing and Reuse in Citizen Science Projects

Summary

The sessions above are only a small subset of the conversations happening across the whole programme and papers are available online for all sessions. There were many demands for more open data, from Theo Bloom using her keynote to call for the abolition of data release embargos to Chaitanya Baruo revealing that Indian geology students are using US data because India does not make its own data available for open academic research. However, there were also excellent case studies of the reuse of data and its value. It would have been interesting to see some more cross-cutting sessions including all of the data collection and sharing cycle, but that will need to wait for 2016! This is a thoroughly recommended conference for data scientists and managers as well as domain experts and has notable participation from the global South, which is excellent and enriches the perspectives discussed.

Finally, I can only apologise for not being able to report on the Strategies Towards Open Science Panel – I was giving a talk at IIT which clashed with the session, but I’ve no doubt some excellent points were raised which will soon be shared!

Bonus slide decks

I couldn’t attend these sessions, but they’re worth a look! First up Susanna Sansone and Brian Hole on data journals:

Improving openness, transparency and reproducibility in scientific research

- October 24, 2014 in Featured, Guest Post, Reproducibility, Research, Tools

This is a guest post for Open Access Week by Sara Bowman of the Open Science Framework.

Understanding reproducibility in science

Reproducibility is fundamental to the advancement of science. Unless experiments and findings in the literature can be reproduced by others in the field, the improvement of scientific theory is hindered. Scholarly publications disseminate scientific findings, and the process of peer review ensures that methods and findings are scrutinized prior to publication. Yet, recent reports indicate that many published findings cannot be reproduced. Across domains, from organic chemistry ((Trevor Laird, “Editorial Reproducibility of Results” Organic
Process Research and Development) to drug discovery (Asher Mullard, “Reliability
of New Drug Target Claims Called Into Question

Nature Reviews Drug Development) to psychology (Meyer and Chabris, “Why Psychologists’ Food Fight Matters” Slate), scientists are discovering difficulties in replicating
published results.

Various groups have tried to uncover why results are unreliable or what characteristics make studies less reproducible (see John Ioannidis’s “Why Most Published Research Findings Are False,” PLoS, for example). Still others look for ways to incentivize practices that promote accuracy in scientific publishing (see Nosek, Spies, and Motyl, “Scientific Utopia II: Restructuring Incentives and Practices to Promote Truth Over Publishability” Perspectives on Psychological Science). In all of these, the underlying theme is the need for transparency surrounding the research process – in order to learn more about what makes research reproducible, we must know more about how the research was conducted
and how the analyses were performed.
Data, code, and materials sharing can shed light on research design and analysis decisions that lead to reproducibility. Enabling and incentivizing these practices is the goal of The Open Science Framework, a free, open source web application built by the Center for Open Science.


The right tools for the job

The
Open Science Framework (OSF)
helps researchers manage their research workflow and enables data and materials sharing both with collaborators and with the public. The philosophy behind the OSF is to meet researchers where they are, while providing an easy means for opening up their research if it’s desired or the time is right. Any project hosted on the OSF is private to collaborators by default, but making the materials open to the public is accomplished with a simple click of a button.

Here, the project page for the Reproducibility Project: Cancer Biology demonstrates the many features of the Open Science Framework (OSF). Managing contributors, uploading files, keeping track of progress and providing context on a wiki, and accessing view and download statistics are all available through the project page.

Here, the project page for the Reproducibility Project: Cancer Biology demonstrates the many features of the Open Science Framework (OSF). Managing contributors, uploading files, keeping track of progress and providing context on a wiki, and accessing view and download statistics are all available through the project page.

Features of the OSF facilitate transparency and good scientific practice
with minimal burden on the researcher. The OSF logs all actions by contributors and maintains full version control. Every time a new version of a file is uploaded to the OSF, the previous versions are
maintained so that a user can always go back to an old revision. The OSF performs logging and maintains version control without the researcher ever having to think about it – no added steps to the workflow, no extra record-keeping to deal with.

The OSF integrates with other services (e.g., GitHub, Dataverse, and Dropbox)
so that researchers continue to use the tools that are practical, helpful, and a part of the workflow, but gain value from the other features the OSF offers. An added benefit is in seeing materials from
a variety of services next to each other – code on GitHub and files on Dropbox or AmazonS3 appear next to each other on the OSF – streamlining research and analysis processes and improving workflows.

 Each project, file, and user on the OSF has a persistent URL, making content citable. The project in this screenshot can be found at https://osf.io/tvyxz.

Each project, file, and user on the OSF has a persistent URL, making content citable. The project in this screenshot can be found at https://osf.io/tvyxz.

Other features of the OSF incentivize researchers to open up their data and materials. Each project, file, and user is given a globally unique identifier – making all materials citable and ensuring
researchers get credit for their work. Once materials are publicly available, the authors can access statistics detailing the number of views and downloads of their materials, as well as geographic
information about viewers. Additionally, the OSF applies the idea of “forks,” commonly used in open source software development, to scientific research. A user can create a fork of another project, to
indicate that the new work builds on the forked project or was inspired by the forked project. A fork serves as a functional citation; as the network of forks grows, the interconnectedness of a body of research becomes apparent.

Openness and transparency about the scientific process informs the development of best practices for reproducible research. The OSF seeks both to enable that transparency, by taking care of “behind
the scenes” logging and versioning without added burden on the researcher – and to improve overall efficiency for researchers and their daily workflows. By providing tools for researchers to
easily adopt more open practices, the Center for Open Science and the OSF seek to improve openness, transparency, and – ultimately – reproducibility in scientific research.

10,000 #OpenScience Tweets

- March 20, 2014 in Media, Research, Tools

We have collected 10,000+ tweets using the #openscience hashtag on Twitter, and invite volunteers to help analyse the data. The twelve most-retweeted tweets are embedded below.

Happily, just over 4,600 accounts have participated in the Open Science community with its eponymous hashtag, in this span. The 10,000 tweets have accrued over ten weeks. Our own @openscience on Twitter has tweeted most, over 600 times at the hashtag, as well as having received the most retweets and @ mentions, over 8,000 in these 10,000.

We have modified the vis which came with the data via the satisfying TAGS effort shared by Martin Hawksey. We added looks at the numbers of mentions and of mentions per tweets for top tweeters, and rankings of top tweets for the past ten weeks to Martin’s default views. We will continue collecting tweets, but do note that in another month or so, we will reach Google Docs limits e.g. on numbers of cells. We will use additional sheets, so links to all data will have changed, just how depending on when you are reading this post. Ask us @openscience on Twitter.

Help wanted

More could be done; won’t you help? Leave a reply below or ping us @openscience on Twitter if you need edit access to the sheet itself but we would like to see data and analyses in other tools as well. Our work to this point is only to get something started.

Top #openscience tweets of the past ten weeks

 

 

 

 

 

 


The above list is not dynamic. The data collected and displayed here, however are dynamic and refresh themselves hourly.

Not all tweets which are about Open Science include the #openscience hashtag. In a perfectly semantic world, they would and when they can, they really should. It has helped to form a community among the 4,600+ accounts participating in these ten weeks and many others in recent years. A couple reasons the hashtag might not be used in a relevant tweet include the character limit on tweets and lack of awareness of hashtags or of the term Open Science.

We take our organising and leadership role seriously at @openscience on Twitter, an account shared by many in the community. We have a simple policy that all our tweets should be related to Open Science. Even at our account, not all our tweets include the #openscience hashtag, particularly as we discuss related concerns such as Citizen Science or Open Access. An example tweet from the time frame considered here, related to Open Science but not hashtagged as such is below. In this case, the limit on tweet length and the topic led to including #openaccess, not #openscience:

 

The most retweeted, Open Science related tweet of all time, so far as we know, did not use the #openscience hashtag but was lovely. From the Lord of Dance and Prince of Swimwear:

 

Google Summer of Code – roll up for some great open science projects!

- March 15, 2014 in Announcements, events, Tools

The Google Summer of Code mentors have been announced and they include organisations working on some great open science tools. We encourage anyone who works with or knows keen undergraduate coders to promote this opportunity to participate in the summer programme.

Time is short as the deadline for applications is 21 March and ideally student would already be in contact with mentoring organisations, but do spread the word!
gsoc

  • The National Resource for Network Biology (NRNB) is organizing the joint efforts of GenMAPP, Cytoscape, and WikiPathways.
  • Kitware are mentoring on open source chemistry visualisation among other projects.
  • Public Lab are developing project ideas around spectroscopy, aerial mapmaking, and infrared imagery with open hardware tools for citizen science.
  • Scaffoldhunter has ideas on molecular visualisation among other projects.

Plus many more on the full mentors list

We hope to see even more organisations with open science projects involved in 2015 and play a role in promoting this increase. If you think you would like to mentor for GSoC, do make use of the open-science mailing list to draw on the advice and experience of those who have been through the process before and bounce ideas around.


GSoC logoCC-BY-NC-ND 3.0

Content Mining: Scholarly Data Liberation Workshop

- December 14, 2013 in events, Oxford Open Science, Research, Tools

The November Oxford Open Science meeting brought over 20 researchers together for a ‘Content Mining: Scholarly Data Liberation Workshop’.

Iain Emsley and Peter Murray-Rust kicked off proceedings by presenting their work on mining Twitter and academic papers in chemistry and phylogenetics respectively.

Next we tried out web-based tools such as Tabula for extracting tables from PDF (we were fortunate enough to have Manuel Aristarán of Tabula joining us remotely via Skype) and ChemicalTagger for tagging and parsing experimental sections in chemistry articles.

OOS

We then got down to business with some hands-on extraction of species from HTML papers and mentions of books on Twitter using regular expressions. All code is open source so you are welcome and encouraged to play, fork and reuse!

Peter’s tutorial and code to extract species from papers can be found on bitbucket and the relevant software and command line tools have helpfully been bundled into a downloadable package. Iain has also documented his flask application for Twitter mining on github so have a go!

If this has whet your appetite for finding out more about content mining for your research and you’d like to ask for input or help or simply follow ongoing discussion then join our

open content mining mailing list


Some furry friends joined in the efforts - met Chuff the OKF Okapi and AMI the kangaroo

Some furry friends joined in the efforts – met Chuff the OKF Okapi and AMI the kangaroo

On the Harvard Dataverse Network Project (and why it’s awesome)

- December 10, 2013 in Panton Fellowships, Panton Principles, Tools

I am a huge fan of grass-roots approaches to scholarly openness. Successful community-led initiatives tend to speak directly to that community’s need and can grow by attracting interest from members on the fringes (just look at the success of the arXiv, for example). But these kinds of projects tend to be smaller scale and can be difficult to sustain, especially without any institutional backing or technical support.

This is why the Harvard Dataverse Network is so great: it facilitates research data sharing through a sustainable, scalable, open-source platform maintained by the Institute for Quantitative Social Sciences at Harvard. This means it is sustainable through institutional backing, but also empowers individual communities to manage their own research data.

In essence, a Dataverse is simply a data repository, but one that is both free to use and fully customisable according to a community’s need. In the project’s own words:

 

‘A Dataverse is a container for research data studies, customized and managed by its owner. A study is a container for a research data set. It includes cataloging information, data files and complementary files.’

(http://thedata.harvard.edu/dvn/)

 

There are a number of ways in which the Dataverse Network can be used to enable Open Data.

Journals

A Dataverse can be a great way of incentivising data deposition among journal authors, especially when coupled with journal policies of mandating Open Data for all published articles. Here, a journal’s editor or editorial team would maintain the Dataverse itself, including its look and feel, which would instil confidence in authors that the data is in trusted hands. In fact, for journals housed on Open Journal Systems, there will soon be a plugin launched that directly links the article submission form with the journal’s Dataverse. And so, from an author’s perspective, the deposition of data will be as seamless as submitting a supporting information file. This presentation [pdf] goes into the plugin in more detail (and provides more info on the Dataverse project itself).

(Sub-)Disciplines

There are some disciplines that simply do not have their own subject-specific repository and so a Dataverse would be great for formalising and incentivising Open Data here. In many communities, datasets are uploaded to general repositories (Figshare, for example) that may not be tailored to their needs. Although this isn’t a problem – it’s great that general repositories exist – a discipline-maintained repository would automatically confer a level of reputation sufficient to encourage others to use it. What’s more, different communities have different preservation/metadata needs that general repositories might not be able to offer, and so the Dataverse could be tailored exactly to that community’s need.

Individuals

Interestingly, individuals can have their own Dataverses for housing all their shared research data. This could be a great way of allowing researchers to showcase their openly available datasets (and perhaps research articles too) in self-contained collections. The Dataverse could be linked to directly from a CV or institutional homepage, offering a kind of advertisment for how open a scholar one is. Furthermore, users can search across all Dataverses for specific keywords, subject areas, and so on, so there is no danger of being siloed off from the broader community.

So the Dataverse Network is a fantastic project for placing the future of Open Data in the hands of researchers and it would be great to see it adopted by scholarly communities throughout the world.

 

Open and transparent altmetrics for discovery

- December 9, 2013 in Panton Fellowships, Research, Tools

6795008004_8046829553

by AG Cann

Altmetrics are a hot topic in scientific community right now. Classic citation-based indicators such as the impact factor are amended by alternative metrics generated from online platforms. Usage statistics (downloads, readership) are often employed, but links, likes and shares on the web and in social media are considered as well. The altmetrics promise, as laid out in the excellent manifesto, is that they assess impact quicker and on a broader scale.

The main focus of altmetrics at the moment is evaluation of scientific output. Examples are the article-level metrics in PLOS journals, and the Altmetric donut. ImpactStory has a slightly different focus, as it aims to evaluate the oeuvre of an author rather than an individual paper.

This is all good and well, but in my opinion, altmetrics have a huge potential for discovery that goes beyond rankings of top papers and researchers. A potential that is largely untapped so far.

How so? To answer this question, it is helpful to shed a little light on the history of citation indices.

Pathways through science

In 1955, Eugene Garfield created the Science Citation Index (SCI) which later went on to become the Web of Knowledge. His initial idea – next to measuring impact – was to record citations in a large index to create pathways through science. Thus one can link papers that are not linked by shared keywords. It makes a lot of sense: you can talk about the same thing using totally different terminology, especially when you are not in the same field. Furthermore, terminology has proven to be very fluent even in the same domain (Leydesdorff 1997). In 1973, Small and Marshakova realized – independently from each other – that co-citation is a measure of subject similarity and therefore can be used to map a scientific field.

Due to the fact that citations are considerably delayed, however, co-citation maps are often a look into the past and not a timely overview of a scientific field.

Altmetrics for discovery

In come altmetrics. Similarly to citations, they can create pathways through science. After all, a citation is nothing else but a link to another paper. With altmetrics, it is not so much which papers are often referenced together, but rather which papers are often accessed, read, or linked together. The main advantage of altmetrics, as with impact, is that they are much earlier available.

clickstream_map

Bollen et al. (2009): Clickstream Data Yields High-Resolution Maps of Science. PLOS One. DOI: 10.1371/journal.pone.0004803.

One of the efforts in this direction is the work of Bollen et al. (2009) on click-streams. Using the sequences of clicks to different journals, they create a map of science (see above).

In my PhD, I looked at the potential of readership statistics for knowledge domain visualizations. It turns out that co-readership is a good indicator for subject similarity. This allowed me to visualize the field of educational technology based on Mendeley readership data (see below). You can find the web visualization called Head Start here and the code here (username: anonymous, leave password blank).

headstart

http://labs.mendeley.com/headstart

Why we need open and transparent altmetrics

The evaluation of Head Start showed that the overview is indeed more timely than maps based on citations. It, however, also provided further evidence that altmetrics are prone to sample biases. In the visualization of educational technology, the computer science driven areas such as adaptive hypermedia are largely missing. Bollen and Van de Sompel (2008) reported the same problem when they compared rankings based on usage data to rankings based on the impact factor.

It is therefore important that altmetrics are transparent and reproducible, and that the underlying data is openly available. This is the only way to ensure that all possible biases can be understood.

As part of my Panton Fellowship, I will try to find datasets that satisfy these criteria. There are several examples of open bibliometric data, such as the Mendeley API, and figshare API that have adopted CC BY, but most of the usage data is not available publicly or cannot be redistributed. In my fellowship, I want to evaluate the goodness of fit of different open altmetrics data. Furthermore, I plan to create more knowledge domain visualizations such as the one above.

So if you know any good datasets please leave a comment below. Of course any other comments on the idea are much appreciated as well.

Open Scholar Foundation

- December 6, 2013 in Announcements, Guest Post, Reproducibility, Research, Tools

This is a guest post from Tobias Kuhn of the Open Scholar Foundation. Please comment below or contact him via the link above if you have any feedback on this initiative!

logo(2)

The goal of the Open Scholar Foundation is to improve the efficiency of scholarly communication by providing incentives for researchers to openly share their digital research artifacts, including manuscripts, data, protocols, source code, and lab notes.

The proposal of an “Open Scholar Foundation” was one of the winners of the 1K challenge of the Beyond the PDF conference. This was the task of the challenge:

What would you do with 1K that would significantly advance scholarly communication that does not involve building a new software tool?

The idea was to establish a committee that would certify researchers as “Open Scholars” according to given criteria. This was the original proposal:

I would set up a simple “Open Scholar Foundation” with a website, where researchers can submit proofs that they are “open scholars” by showing that they make their papers, data, metadata, protocols, source code, lab notes, etc. openly available. These requests are briefly reviewed, and if approved, the applicant officially becomes an “Open Scholar” and is entitled to show a banner “Certified Open Scholar 2013” on his/her website, presentation slides, etc. Additionally, there could be annual competitions to elect the “Open Scholar of the Year”.

An alternative approach (perhaps more practical and promising) would be to provide a scorecard for researchers to calculate their “Open Scholar Score” on their own. There is an incomplete draft of such a scorecard in the github repo here.

In any case, his project should lead to an established and recognized foundation that motivates scholars to openly share their data and results. Being a certified Open Scholar should be something that increases one’s reputation and visibility, and should give a counterweight to possible benefits from keeping data and results secret. The criteria for Open Scholars should become more strict over time, as the number of “open-minded” scholars hopefully increases over the years. This should go on until, eventually, scholarly communication has fundamentally changed and does not require this special incentive anymore.

It is probably a good idea to use Mozilla Open Badges for these Open Scholar banners.

We are at the very beginning with this initiative. If you are interested in joining, get in touch with us! We are open to any kind of feedback and suggestions.

OKCon Open & Citizen Science hackday: projects

- September 14, 2013 in Announcements, events, Hackday, Members, OKCon, Tools

Join us geeking out Thursday, Sept 19, 10:00 to 17:00 CEST at #OKCon and online! Details are below. See also our announcement of this event and everyone’s votes for favourite projects.

For WikiSprint: Global overview of Open Science initiatives please join us remotely via the coordinating Etherpad (found: https://etherpad.mozilla.org/xpQvKfNv5c) and working either here or on Wikipedia.

For other projects, join us in IRC: #openscience on freenode or via the web at http://webchat.freenode.net/?channels=openscience. Find us on Twitter @MaliciaRogue, @stefankasberger, @openscience, and at #openscience or #OKCon.

okcon_science

Proposal 1

Title: “Open Data in Research: an illusion?”

Details: Despite the dazzling development of the open access movement, open data initiatives in science and research are still trailing in involvement. Additionally, disparities in research data sharing and openness are huge across scientific communities and domains.

Last but not least, formats and licensing terms greatly vary even within specific field. This suggested activity will wrap-up current initiatives and achievements prior to formalizing the challenges ahead. The middle-term goal is to bootstrap connections converging to a true institutional change that leads to more participative, shareable and transparent science: the science of tomorrow.

Support: Open Data enthusiasts, geeks and science nerds welcome.

Comment: Remote participation welcome (IRC, pad). Hashtag: #OpenSciData

Proposal 2

Title: “An inclusive approach to open science”

Details: The discourse in open science often runs along the lines of open vs. closed approaches. In reality though, most researchers act in-between those two extremes. From successful examples such as genomics, we can see that open science is essentially a community effort (cp. Bermuda Principles). Therefore, we (the Austrian chapter of the OKFN) advocate an inclusive approach to open science.

From a community perspective, it is the commitment to openness that matters, and the willingness to promote this openness on editorial boards and program committees. It is therefore important to get as many researchers on board as possible. This approach is _not_ intended to replace existing initiatives but to make researchers aware of these initiatives and helping them with choosing their approach to open science.

The idea of this hackathon is to create a manifesto/declaration for such an inclusive approach. A draft and a first discussion can be found here: http://science20.wordpress.com/2013/06/25/an-inclusive-approach-to-open-science/
We invite contributions from researchers in various disciplines on their experiences with advocating and implementing open science practices. This could be in the form of presentations, lightning talks, or focused discussions.

Support: We mainly need creative minds; designers, illustrators, and animators are welcome as we could produce a short video about the idea.

Comment: N/A

Proposal 3

Title: “Wikisprint: Global overview of OpenScience initiatives”

Details: A few months ago an event was organised to agregate links and knowledge about P2P initiatives. http://codigoabiertocc.wordpress.com/2013/08/07/globalp2p-the-wind-that-shook-the-net/
In partnership with Michel Bauwens of the P2P Foundation and HackYourPhd I’d like to organize a similar event for OpenScience initiative. The P2P Foundation aims to promote and document peer to peer practices in a very broad sense. The collective HackYourPhd federate numerous students, researchers and citizens interested in the production and the sharing of knowledge. Being an administrator on the French Wikipedia, I will likely get support from the Wikimedia communities.

This “wikisprint” will be set up as follow:

  1. The idea will be to announce the event a few days ago and invite people on twitter and other plateform to share their initiative with us.
  2. We could for exemple use the hashtag #OpenScienceWiki
  3. During the hackathon People in Geneva but also elsewhere could help to agregate the links in a wiki, interact with people all around the world and invite them to share their initiatives.
    We can use the P2Pwiki: http://p2pfoundation.net/Spanish_P2P_WikiSprint
    We could also map this OpenSciene initiative in a map http://maps.ubimix.com/hyphdus/
  4. We could also visualize all the interaction with the hashtag
    Here is an example of what people have done during the #GlobalP2P event: http://demos.outliers.es/wikiSprint/
  5. Once the broad mapping is done on the P2Pwiki, it could serve to enhance several Wikipedia articles on Open Science. The content is currently rather poor: see for instance http://en.wikipedia.org/wiki/Open_knowledge and to a lesser extent http://en.wikipedia.org/wiki/Open_Science. Wikidata — the growing open data repository of the Wikimedia Foundation — could also use some contributions to the topic https://www.wikidata.org/wiki/Q2251455 and https://www.wikidata.org/wiki/Q309823 are empty.
  6. Illustrations and dataviz might also be welcome: for instance, graphics of academic publishing economics (figures are rather hard to get).

Support: Designer and programmer are welcome for the visualisation

Comment: Here are some guidelines given by Michel Bauwens to help us organize this workshop.

  • it’s important to give some basic how to advice at the beginning of the process
  • in each locale, it’s good to have a person that can just wander around and help and stimulate the other people (this makes a big difference)
  • we had a permanent rolling hangout, with every hour a different topic to be discussed (it went on for 15 hours or so during the hispanic wikisprint)
  • it makes it much more easy if there is a pre-established form, with the tickable tags etc.
  • clear delimitation of subject matter, not anything goes , make sure you specify what open science is inclusive of, perhaps some geographic limitation (say Europe) etc..
  • choice of tags: one for the event itself, say [[Category:OpenScience Wikisprint]]; one for the topic, so that it continues to live on after the event, say [[Category:Open Science]]
    this can be combined for example with country tags, [[Category:France]] etc.
    (the p2pfoundation.net wiki already has http://p2pfoundation.net/Category:Science for the broader p2p/commons aspects of science, this would allow a more specialized focus)
    I will be also present during this workshop to help the interaction with Wikipedia and the wikipedia community.
  • Including Wikipedia within the wikisprint could stimulate global contribution by attracting experienced wiki user. We can create a parallel contribution project (an example: http://en.wikipedia.org/wiki/Wikipedia:GLAM/MonmouthpediA )

Proposal 4

Title: “rOpenGov – R ecosystem for social and political science”

Details: With the avalanche of open government data and other fields relevant to computational social science, new algorithms are needed to take full advantage of these new information resources – to access, analyse and communicate such information in a fully transparent and reproducible manner as part of scientific inquiry or citizen science projects.

A scalable solution will require coordinated effort from independent developers. Hence, we are now building up a community-driven ecosystem of R packages dedicated to open government data and computational social and political science, building on lessons learned from analogous and wildly successful projects in other fields. The site already provides open source R tools for open government data analytics for Austria, Finland, and Russia and we are now actively collecting further contributions.

The preliminary project website is at: http://louhos.github.io/en/ropengov.html

Support: In addition to internet access, the project would benefit from contributions from website designers, scientists and R package developers.

Comment: Distant participation to the hackathon through IRC/Skype is also possible.

Proposal 5

Title: “Crowdcrafting Everywhere”

Details: Crowdcrafting is a straightforward, open source handy tool for citizen science. Unfortunately, Crowdcrafting solely speaks English for now. What about translating it into other languages, e.g. French, Spanish, Russian,…?

Support: Multilingual enthusiasts welcome!

Comment: Remote participation welcome.

Hashtag: #CCEverywhere.

Crowdcrafting’s lead developer, Daniel Lombrana-Gonzalez, will also be with us throughout the whole day.

Proposal 6

Title: Open Access Button

Details: Open Access Button is a browser-based tool which tracks every time someone is denied access to a paper. We want to display this, along with the person’s location, profession and story on a real time, worldwide, interactive map of the problem. While creating pressure to open up scholarly and scientific research, we also want to help people work within the current broken system by helping them get access to the paper they need.

That’s the project summed up really briefly. We built a prototype at the start of the summer and are working towards a launch of later in the year.

Support: tbc

Comment: Waiting for confirmation for founders to join in person. Remote participation will be confirmed soon.

Proposal 7

Title: “Booksprint: OpenScience Guidelines for PhD Students and researchers”

Description: Organize a book sprint to write a guide about how to do open science for researchers or PhD students.

No special skills are needed to participate, if you are a PhD student or a students or know the basic of science from another area. We will share our ideas and experience with open science.

Possible chapters:
* What does it mean to publish in open access?
* How do you go about publishing in open access?
* What is an “Open notebook”?
* How do I organize an open notebook?
* Which other tools are available?
* What tools are missing?
* How do we communicate and better support each other?
etc.

To write the book, we will use Fidus Writer ( http://fiduswriter.org ), an open source, webbased editor that typesets academic writing with citations and formulas, and lets us publish PDFs or ebooks of articles and/or journals without any technical skills. The Fidus Writer team will assist via hangout/chat.

Support: Some designers are welcome to help for figures, and other visualisations.
Internet access has to be available and Google Chrome or Chromium installed on the machines.
Artististic minds are also welcome 🙂

Comment: I think it would be a good idea to find a printing solution as well, because to have something in your hands, can be very engaging and it would be great for hackyourpdh to have something to show around. But this could be done afterwards.