Unter dem Motto Wissenschaft neu kommunizieren kommen SchülerInnen und junge Studierende aus unterschiedlichen Fachrichtungen für einen Tag zusammen und entwickeln neue, offene Möglichkeiten, um Wissenschaft zu präsentieren. — >> Wann: Samstag, 21. November 2015 von 9 – 20 Uhr >> Wo: TU Wien, Festsaal und Boecklsaal >> Wer: Mitmachen können SchülerInnen ab 17 Jahren und […]
After a bunch of unsuccessful attempts of trying to get some sort of project going within a Open Science community, I decided to start research on how to build a successful non-technical Open * community. I’m aware that could be just be a matter of time commitment but I still think it be worth it […]
Doing a PhD is laborious, hard, demanding, exhausting… Your thesis is usually the result of blood, sweat and tears. And you are usually alone. Well, what woud you say if I tell you that a researcher got helped by more than 17 thousand volunteers?
Yes, you’ve read it right: more than 17 thousand people have helped Alejandro Sánchez to do his research, publishing his thesis as a result and getting the best possible mark: cum laude. Amazing, right?
But how this happened? How did he managed to involve such a big crowd? I mean, most people think science is boring, tedious, difficult, add here your adjective… However, this guy managed to get 17 thousand people from all over the world to help him on:
Best part? They did it because they wanted to help. No money involved! Just pure kindness.
In other words, the unexpected happened, and thanks to sharing his work and also asking for help for his research -studying light pollution on cities- he managed to achieve the unconceivable: involving more than 17 thousand people on scientific research.
How this started? Well, let’s start from the beginning.
The beginning: laying down the ideas
At the summit there was a workshop where scientists and hackers joined forces to create new citizen science projects. Wait, let me explain first what’s citizen science so we can enjoy the trip later on (like this kid, I promise).
Citizen science is the active contribution of people who are not professional scientists to science. It provides volunteers with the opportunity to contribute intellectually to the research of others, to share resources or tools at their disposal, or even to start their own research projects. Volunteers provide real value to ongoing research while they themselves acquire a better understanding of the scientific method.
In other words, citizen science opens the doors of laboratories and makes science accessible to all. It facilitates a direct conversation between scientists and enthusiasts who wish to contribute to scientific endeavor.
Now, with this idea in our minds let’s get back to Alejandro’s research.
At this workshop Alejandro told me that he was studying light pollution on cities. He and his team realized that the astronauts from the International Space Station take pictures of the earth with a regular camera. Those pictures are then saved in a big archive. However, there are some issues:
- The pictures could be from cities at night or day.
- They take selfies too (who doesn’t?)
- The moon, stars and Aurora Borealis are also pretty, so they photograph them too.
- The archive does not have any order or filter, everything is mixed in there.
In summary, he needs pictures at night of cities (sharp and without clouds) but the archive is a mess. The archive has too many different photos and possible scenarios that algorithms cannot help him to classify them (or at a later stage geolocate them). However, you and me are pretty good at identifying cities at night with a glimpse, so we decided to create a prototype in Crowdcrafting.
The first project was Dark Skies. We had the first prototype in a few hours and we basically asked people to help us to classify the pictures in different categories:
- City at night
- Aurora Borealis
- None of these
- I don’t know
The project was simple and fun. I remember enjoying a lot classifying beautiful pictures from the ISS. It make me feel I was an astronaut, and I loved that feeling so we share it with our friends and colleagues.
We really believed on the project, specially Alejandro, so he invited me to meet his PhD advisor and his colleagues. We met and studied how we could improve it. As a result two new projects were born in the next months: Lost at night and Night Cities ISS
The small announcement that became huge
After a lot of work, Alejandro thought that the projects were good enough to send them to NASA and ESA. Alejandro wrote a press release and share with them what we were doing.
In the beginning we thought that they will ignore us, but something happened. It started like a tremble. With a tweet:
Then, almost one month later NASA wrote a full article about the project and tweeted about it:
Thanks to this coverage, in just one month we were able to classify more than 100 thousand images. One day Crowdcrafting servers stored more than 1.5 answers per second! We were like this:
The calm after the storm
As with any press coverage after a few weeks everything went back to normal. However, lots of people kept coming and helping the projects from Alejandro.
Over a year we kept fixing bugs, adding new tasks, answering questions from volunteers, sharing progress, etc. In July Alejandro defended his thesis with all this work. Amazing!
From my side I’m so happy and proud about it for two reasons. First, while the thesis has been presented, the projects keeps going.
At the time of this writing the Dark Skies project has classified almost 700 images in the last 15 days. Amazing!
Secondly, because this is the very first thesis that uses PyBossa and Crowdcrafting for doing open research. I’m impressed and I think this is just the beginning for many more researchers doing their research on the open inviting society to take part on it.
The future? Well, Alejandro has launched a Kickstarter campaign to get financial support to keep running the research his doing. If he gets the financial support more data will be analyzed, new results will be produced and it will help to keep running Crowdcrafting and PyBossa. Thus, if you like the project help Alejandro to build the most beautiful atlas of earth at night!
In a previous post I’ve already said that I love uWSGI. The main reason? You can do lots of nice tricks in your stack without having to add other layers to it, like for example: graceful reloading.
The documentation from uWSGI is really great, and it covers most of the cases for graceful reloading, however due to our current stack and our auto deployments solution we needed something that integrated well with the so called: Zerg dance.
The Zerg mode is a nice feature from uWSGI that allows you to run your web application passing file descriptors over Unix sockets. As stated on the official docs:
Zerg mode works by making use of the venerable “fd passing over Unix sockets” technique.
Basically, an external process (the zerg server/pool) binds to the various sockets required by your app. Your uWSGI instance, instead of binding by itself, asks the zerg server/pool to pass it the file descriptor. This means multiple unrelated instances can ask for the same file descriptors and work together.
This is really great, as you only need to enable a Zerg server and then you are ready to use it.
As we use Supervisor, configuring uWSGI to run as a Zerg server is really simple:
[uwsgi] master = true zerg-pool = /tmp/zerg_pool_1:/tmp/zerg_master.sock
Then, you configure your web application to use the zerg server:
[uwsgi] zerg = /tmp/zerg_master.sock
And you are done! That will configure your server to run in Zerg mode. However, we can configure it to handle reloading in a more useful way: keeping a binary copy of the previous running instance, pausing it, and deploying the new code on a new Zerg. This is known as Zerg Dance, so let’s dance!
With the Zerg dance we’ll be able to do deployments while the users keep using your web application, as the Zerg server will be always handling those requests properly.
The neat trick from uWSGI is that it will handle those requests pausing them, so the user thinks it’s getting slower, while the new deployment is taking place. As soon as the new deployment is running it moves the “paused request” to the new code and keeps the old copy in case you broke something. Nice, right?
To achieve this situation all you have to do is use 3 different FIFOs in uWSGI. Why? Because uWSGI can have as many master FIFOs as you want allowing you to pause zerg servers and move between them. This feature allows us to keep a binary copy of previously deployed code on the server, that you can pause/resume and use it when something goes wrong.
This is really fast. The only issue is that you’ll need more memory on your server, but I think it’s worthy as you’ll be able to rollback a deployment with just two commands (we’ll see that in a moment).
Configuring the 3 FIFOs
The documentation has a really good example. All you have to do is to add 3 FIFOs to your web application uWSGI config file:
[uwsgi] ; fifo '0' master-fifo = /var/run/new.fifo ; fifo '1' master-fifo = /var/run/running.fifo ; fifo '2' master-fifo = /var/run/sleeping.fifo ; attach to zerg zerg = /var/run/pool1 ; other options ... ; hooks ; destroy the currently sleeping instance if-exists = /var/run/sleeping.fifo hook-accepting1-once = writefifo:/var/run/sleeping.fifo Q endif = ; force the currently running instance to became sleeping (slot 2) and place it in pause mode if-exists = /var/run/running.fifo hook-accepting1-once = writefifo:/var/run/running.fifo 2p endif = ; force this instance to became the running one (slot 1) hook-accepting1-once = writefifo:/var/run/new.fifo 1
After the FIFOs there is a section where we declare some hooks. These hooks will handle automatically which FIFO has to be used in case of a server is started again.
The usual work flow will be the following:
- You start the server.
- There is not sleeping or running fifo, so those conditions fail
- Therefore, once the server is ready to accept requests (thanks to hook-accepting1-once) it moves the server from the new.fifo to running.fifo
Right now you’ve a server running as before. Imagine now you have to change something in the config or you have a new deployment. You do the changes, and start a new server with the same uWSGI config file. This will happen:
- You start the second server.
- There is not sleeping fifo, so this condition fails
- There is a running fifo, so this condition is met. Thus, the previous server is moved to the sleeping fifo and its paused when the new server is ready to accept requests.
- Finally, once the server is ready to accept requests t moves the server from the new.fifo to running.fifo.
At this moment we’ve two servers: one running (the new one with your new code or config changes) and the old one wich is paused consuming only some memory.
Imagine now you realize that you have a bug in your new deployed code. How do you recover from this situation? Simple!
You just pause the new server and unpause the previous one. How do you do it? Like this:
echo 1p > /tmp/running.fifo echo 2p > /tmp/sleeping.fifo
With our auto deployments solution, we needed to find a simple way to integrate this feature with supervisor. In the previous example you do the deployment manually, but we want to have everything automated.
How we have achieved this? Simple! Using two PyBossa servers within Supervisor.
We have the default PyBossa server, and another one named pybossabak in Supervisor.
When a new deployment is done, the auto deployments solution boots the pybossa Backup server just to have a copy of the running state of the server. Then, it gets all the new changes, applies patches, etc. and restarts the default server. This procedure triggers the following:
- Start backup server: this moves the current running PyBossa server to the pause fifo, so we’ve a copy of it.
- The backup server accepts the requests, so users don’t see anything wrong.
- Autodeployments applies changes to the source code, updates libraries, etc.
- Then, it restarts the default PyBossa server (note: for supervisor the paused PyBossa server is running).
- This restart moves the previous backup server to the pause fifo (it has the old code running), and boots the new code into production.
If something goes wrong with the new changes, all we have to do is pause the current server and resume the previous one.
This is done by hand, as we want to have control over this specific issue, but overall we are always covered when doing deployments automatically. We only have to click in the Merge Button of Github to do a deployment and we know a backup binary copy is hold on memory in case that we commit an error.
Moreover, the whole process of having uWSGI moving the requests of users from one server to another is great!
We’ve seen some users getting a 502, but that’s because they ask for a request when the file descriptor is being moved to the new server. Obviously, this is not 100% bullet proof, but much better than showing to all your users a maintenance page while you do the upgrade.
We’ve been using this new work flow for a few weeks now, and all our production deployments are done automatically. Since we adopted this approach we’ve not have any issues, and we are more focused only on developing more code. We employ less time handling deployments, which is great!
In summary: if you are using uWSGI, use the Zerg Dance, and enjoy the dance!
This is old, unpublished news/post that I never got around to posting for some reason… In the last week of March, I started to think about how the Open Science Framework (OSF) can foster a non-technical community. At first, I thought about only of advocacy and teaching of the scientific process. But after the response […]