Posts tagged "free-culture":

26 Nov 2014

Is open and free global land cover mapping possible?

Short answer: yes.

Mid November took place in Toulouse "Le Capitole du libre", a conference on Free Software and Free Culture. The program this year was again full of interesting talks and workshops.

This year, I attended a workshop about contributing to Openstreetmap (OSM) using the JOSM software. The workshop was organised by Sébastien Dinot who is a massive contributor to OSM, and more importantly a very nice and passionate fellow.

I was very happy to learn to use JOSM and did 2 minor contributions right there.

During the workshop I learned that, over the past, OSM has been enriched using massive imports from open data sources, like for instance cadastral data bases from different countries or the Corine Land Cover data base. This has been possible thanks to the policies of many countries which have understood that the commons are important for the advancement of society. One example of this is the European INSPIRE initiative.

I was also interested to learn that what could be considered niche data, like agricultural land parcel data bases as for instance the French RPG have also been imported into OSM. Since I have been using the RPG at work for the last 4 years (see for example here or here), I was sympathetic with the difficulties of OSM contributors to efficiently exploit these data. I understood that the Corine Land Cover import was also difficult and the results were not fully satisfactory.

As a matter of fact, roads, buildings and other cartographic objects are easier to map than land cover, since they are discrete and sparse. They can be pointed, defined and characterised more easily than natural and semi-natural areas.

After that, I could not avoid making the link with what we do at work in terms of preparing the exploitation of upcoming satellite missions for automatic land cover map production.

One of our main interests is the use of Sentinel-2 images. It is the week end while I am writing this, so I will not use my free time to explain how land cover map production from multi-temporal satellite images work: I already did it in my day job.

What is therefore the link between what we do at work and OSM? The revolutionary thing from my point of view is the fact that Sentinel-2 data will be open and free, which means that the OSM project could use it to have a constantly up to date land cover layer.

Of course, Sentinel-2 data will come in huge volumes and a good amount of expertise will be needed to use them. However, several public agencies are paving the road in order to deliver data which is easy to use. For instance, the THEIA Land Data Centre will provide Sentinel-2 data which is ready to use for mapping. The data will be available with all the geometric and radiometric corrections of the best quality.

Actually, right now this is being done, for instance, for Landsat imagery. Of course, all these data is and will be available under open and free licences, which means that anyone can start right now learning how to use them.

However, going from images to land cover maps is not straightforward. Again, a good deal of expertise and efficient tools are needed in order to convert pixels into maps. This is what I have the chance to do at work: building tools to convert pixels into maps which are useful for real world applications.

Applying the same philosophy to tools as for data, the tools we produce are free and open. The core of all these tools is of course the Orfeo Toolbox, the Free Remote Sensing Image Processing Library from CNES. We have several times demonstrated that the tools are ready to efficiently exploit satellite imagery to produce maps. For instance, in this post here you even have the sequence of commands to generate land cover maps using satellite image time series.

This means that we have free data and free tools. Therefore, the complete pipeline is available for projects like OSM. OSM contributors could start right now getting familiar with these data and these tools.

Head over to CNES servers to get some Landsat data, install OTB, get familiar with the classification framework and see what could be done for OSM.

It is likely that some pieces may still be missing. For instance, the main approach for the map production is supervised classification. This means that we use machine learning algorithms to infer which land cover class is present at every given site using the images as input data. For these machine learning algorithms to work, we need training data, that is, we need to know before hand the correct land cover class in some places so the algorithm can be calibrated.

This training data is usually called ground truth and it is expensive and difficult to get. In a global mapping context, this can be a major drawback. However, there are interesting initiatives which could be leveraged to help here. For instance, Geo-Wiki comes to mind as a possible source of training data.

As always, talk is cheap, but it seems to me that exciting opportunities are available for open and free quality global mapping. This does not mean that the task is easy. It is not. There are many issues to be solved yet and some of them are at the research stage. But this should not stop motivated mappers and hackers to start learning to use the data and the tools.

04 Feb 2012

Computers run the world, I run computers, therefore ...

The war on general computation

Although I am not particularly fan of Science Fiction, I follow Cory Doctorow's production because of his views on free culture and free software. He has written many essays about the problems with copyright law and how copyright enforcement is a waste of time. He is himself an example of a creator who is able to make a living making available his work without copy restrictions.

Doctorow has also written about the links between free culture (Creative Commons) and free software, and I don't think I am wrong saying that it all boils down to the same thing.

I recently listened to one episode of the Cory Doctorow podcast about the war on general purpose computation. It was a re-run of the talk he gave at the 28th Chaos Communications Congress in Berlin around last Christmas. The YouTube video with subtitles is available here.

Doctorow has been recently interviewed on CBC about the same topic.

To make a quick synthesis of the talk, we could say that the content industry, in its blind war on copyright infringement, will tend to try to eliminate general purpose computers – that is computers that you can use for whatever you want – and replace them by appliances – that is devices (stripped down computers) which serve one single purpose: content consumption.

At first sight, this may seem some conspiracy theory, but, actually, Apple has started applying the model. For instance, if I write a program for an iPad and I want to sell it to you, the only legal way I have is to go through the Apple Store (and Apple takes 30\%). Otherwise, you have to jailbreak your device. The same kind of trend is coming to Apple desktop and laptop computers. And not much different is the Secure Boot which Microsoft wants to force on computer manufacturers. Google's Chromebooks and other Android tablets limit you ins a somewhat similar way.

Listen to Doctorow's talk to get the full picture. One of the consequences of this war on general purpose computation is that it might even become illegal – I don't think impossible, however – to run free software on most devices available in the market.

The scientist

Since most, if not all, of the science depends in one or another way on computers, for the scientist, this has major implications.

The 2 main pillars of science are independence and reproducibility. These means that a scientist must be able to explain and verify any single result or statement she makes public. In the case of results produced using computer software, being able to look under the hood of the programs used for scientific computation is a fundamental need.

Of course, I am not advocating for the fact that the scientist should code everything herself. This would be too inefficient. But there is when open source software comes to rescue. One can rely on existing tools (libraries, etc.) without having to accept the black box approach of proprietary tools.

I am always astounded to see colleagues using tools for which they don't have the source code. The argument is always that they are being pragmatic, and that theoretical principles about software freedom can be traded against convenience. I can't agree with this.

The day when a bug is detected in one of these software and that you don't have other possibility to get things working correctly than waiting for the new version, you understand that there is a problem.

The day when you can not work because there is a limited number of licenses available in the lab, you understand that there is a problem.

The day you see your students using an illegal version of the software, because they want to improve their skills by working at home, you understand that there is a problem.

And these are only a limited number of real life examples that I have seen around me several times.

Yes, many open source tools lack the degree of polish that proprietary tools have. Some have poor documentation. Some are less user friendly and impose a steep learning curve. The good point is that you can contribute by improving the tool yourself. If you don't have the required skills to do it, you may convince your boss to use the same amount of money she would put into a proprietary software license to hire someone who has the skills to improve the free software you need. After that, do not forget to make your improvements available, so that others benefit from them and consider doing the same.

OK. Enough about work. Let's go home.

The geek dad, or just dad

Most households have at least one computer. Many households have several of them. Nowadays, kids have access to computers from they early age.

Several weeks ago, my children (5 and 7) discovered a web browser in the computer they use for games at home. They also quickly discovered the search engine and the possibility to search on YouTube for videos of their favorite cartoon characters. (In another post I will write about how to explain copyright law to a child who is upset because she can't find on-line the cartoon she wants to watch).

I quickly realized that I hadn't thought about how to filter what they access on the net, so I started looking for something to install.

I know that my ISP proposes a service for that. I know that there are on-line services you can subscribe to. But when it comes to my children, I am not going to give any company the permission to log their web access.

So what I did was installing a free software called DansGuardian which acts as a proxy and filters http requests sent by a browser. It took me 5 minutes to install it and, 10 to understand how to configure it, and 10 more to select the filtering settings I want. OK. I had to read the documentation and mess around with configuration files, but I know exactly what happens, and it's me who looks at the http logs and not an unknown person. Yes, you can say that I am a control freak, and you are right, because, by now, my children are only allowed to use the web browser when their mother or me are with them.

I will call this parental responsibility.

This responsibility includes also teaching critical thinking and giving the freedom to learn. Unfortunately, they don't learn that at school. They don't learn computer skills either. Granted they have computers in the classroom (hey, we are a modern country!), but the teachers are not trained to teach them anything other than to play pseudo-educational games.

The Linux computer my children use has the same kind of software available. They mainly use Tux Paint and GCompris. So what's better with respect to what they have at school?

It's free (as in freedom), so I am confident on being able to run the software on a low end computer if I want to. I can change things that they don't like or that I don't like. For instance, I added their pictures to the set of available Tux Paint stamps. The software is available in many languages, many more than many proprietary software. And this is very useful for kids to learn, even if their mother tongue is a minority one (there are many of them in Europe). Even best, if the language was not available, a parent could translate the software, since all the text messages and the recorded voices are stored in separate files.

This gives a massive freedom to learn, but that's not all. One thing children like very much is taking toys apart. This is not a destructive obsession, but just a need to know how things work. What's the point of playing with software which can't be taken apart to see how it works?

If the war on general purpose computation becomes a widespread reality, our children may not be able to learn to program, or at least, not without spending lots of money in order to buy expensive tools for that. And too bad for those who don't have the money.

They may not be able to put their own pictures on Tux Paint without proving that they own the copyright, and that would be dramatic.

03 Jan 2011

Reproducible research

I have recently implemented the Spectral Rule Based Landsat TM image classifier described in ¹.

This paper proposes a set of radiometric combinations, thresholds and logic rules to distinguish more than 40 spectral categories on Landsat images. My implementation is available in the development version of the Orfeo Toolbox and should be included in the next release:

One interesting aspect of the paper is that all the information needed for the implementation of the method is given: every single value for thresholds, indexes, etc. is written down in the paper. This was really useful for me, since I was able to code the whole system without getting stuck on unclear things or hidden parameters. This is so rarely found in image processing literature that I thought it was worth to post about it. But this is not all. Once my implementation was done, I was very happy to get some Landsat classifications, but I was not able to decide whether the results were correct or not. Since the author of the paper seemed to want his system to be used and gave all details for the implementation, I thought I would ask him for help for the validation. So I sent an e-mail to A. Baraldi (whom I had already met before) and asked for some validation data (input and output images generated by his own implementation). I got something better than only images. He was kind enough to send me the source code of the very same version of the software which was used for the paper – the system continues to be enhanced and the current version seems to be far better than the one published. So now I have all what is needed for reproducible research:

A clear description of the procedure with all the details needed for the implementation.
Data in order to run the experiments.
The source code so that errors can be found and corrected.

I want to publicly thank A. Baraldi for his kindness and I hope that this way of doing science will continue to grow. If you want to know more about reproducible research, check this site.

Footnotes:

Baraldi et al. 2006, "Automatic Spectral Rule-Based PreliminaryMapping of Calibrated Landsat TM and ETM+ Images", IEEE Trans. on Geoscience and Remote Sensing, vol 44, no 9.