Lambda Omega Lambda
In my current research work, I am investigating the possibility of integrating domain expert knowledge together with image processing techniques (segmentation, classification) in order to provide accurate land cover maps from remote sensing image time series. When we look at this in terms of tools (software) there are a set of requirements which have to be met in order to be able to go from toy problems to operational processing chains. These operational chains are needed by scientists for carbon and water cycle studies, climate change analysis, etc. In the coming years, the image data needed for this kind of near-real-time mapping will be available through Earth observation missions like ESA's Sentinel program. Briefly, the constraints that the software tools for this task have are the following.
- Interactive environment for exploratory analysis
- Concurrent/parallel processing
- Availability of state of the art, validated image processing algorithms
- Symbolic, semantic information processing
This long post / short article describes some of the thoughts I have and some of the conclusions I have come to after a rather long analysis of the tools available.
Need for a repl
The repl acronym stands for Read-Evaluate-Print-Loop and I think has its origins in the first interactive Lisp systems. Nowadays, we call this an interactive interpreter. People used to numerical and/or scientific computing will think about Matlab, IDL, Octave, Scilab and what not. Computer scientists will see here the Python/Ruby/Tcl/etc. interpreters. Many people use nowadays Pyhton+Numpy or Scipy (now gathered in Pylab) in order to have something similar to Matlab/IDL/Scilab/Octave, but with a general purpose programming language. What a repl allows is to interactively explore the problem at hand without going through the cycle of writing the code, compiling, running etc. The simple script that I wrote for the OTB blog using OTB's Python bindings could be typed on the interpreter and bring together OTB and Pylab.
Need for concurrent programming
Without going into the details and differences between parallel and
concurrent programming, it seems clear that Moore's law can only
continue to hold through multi-core architectures. In terms of low-level
(pixel-wise) processing, OTB provides an interesting solution
(multi-threaded execution of filters by dividing images into chunks).
This approach can be
generalized
to GPUs. However, sometimes an algorithm needs to operate on the whole
image because the image splitting affects the results. This is typically
the case for Markovian approaches to filtering or methods for image
segmentation. For this cases, one way to speed up things is to process
several images in parallel (if the memory footprint allows that!). On
way of trying to maximize the use of all available computing cores in
Python is using the multiprocessing
module which allows to deal with a
pool of threads. One example would be as follows:
However, this does not allow for easy inter-thread communication which
is not needed in the above example, but can be very useful if the
different processes are working on the same image: imagine a multi-agent
system where classifiers, algorithms for biophysical parameter
extraction, data assimilation techniques, etc. work together to produce
an accurate land cover map. They may want to communicate in order to
share information. As far as I understand, Python has some limitations
due to the
global
interpreter lock. Some languages as for instance Erlang offer
appropriate
concurrency primitives for this. I have
played
a little bit with them in my exploration of 7 languages in 7 weeks.
Unfortunately, there are no OTB bindings for Erlang. Scala has copied
the Erlang actors, but I didn't really got into the
Scala
thing.
Need for OTB access
This one here shouldn't need much explanation. Efficient remote sensing
image processing needs OTB. Period. I
am sorry. I am rather biased on that! I like C++ and I have no problem
in using it. But there is no repl, one needs several lines of
typedef
before being able to use anything. This is the price to pay in
order to have a good static checking of the types before running the
problem. And it's damned fast! We have Python bindings which allows us
to have a clean syntax, like in the
pipeline
example of the OTB tutorials. However, the lack of easy concurrency is a
bad point for Python. Also, the lack of Artificial Intelligence
frameworks for Python is an anti-feature. Java has them, but Java has no
repl and look at
its
syntax. It's worse than C++. You have all these mangled names which
were clean in C++ and Python and become things like
otbImageFileReaderIUS2
. Scala, thanks to its interoperability with
Java (Scala runs in the JVM), can use OTB bindings. Actually, we have a
cleaner syntax than Java's:
There is still the problem of the mangled names, but with some pattern
matching or case classes, this
should disappear. So Scala seems a good candidate. It has:
- Python-like syntax (although statically typed)
- Concurrency primitives
- A repl
Unfortunately, Scala is not a Lisp. Bear with me.
A Lisp would be nice
I want to build an expert system, it would be nice to have something for remote sensing similar to Wolfram Alpha. We could call it Ω Toolbox and keep the OTB name (or close). Why Lisp? Well I am not able to explain that here, but you can read P. Norvig's or P. Graham's essays on the topic. If you have a look at books like PAIP or AIMA, or systems like LISA, CLIPS or JESS, they are either written in Lisp or the offer a Lisp-like DSLs. I am aware of implementations of the AIMA code in Python, and even P. Norvig himself has reasons to have migrated from Lisp to Python, but as stated above, Python seems to be out of the game for me. The code is data philosophy of Lisp is, as far as I understand it, together with the repl tool, one of the main assets for AI programming. Another aspect which is also important is the functional programming paradigm used in Lisp (even though other programming paradigms are also available in Lisp). Concurrency is the main reason for the upheaval of functional languages in recent years (Haskell, for instance). Even though I (still) don't see the need for pure functional programming for my applications, lambda calculus is elegant and interesting. Maybe λ Toolbox should be a more appropriate name?
Clojure
- A repl (no C++, no Java)
- Concurrency (no Python)
- OTB bindings available (no Erlang, no Haskell, no Ruby)
- Lisp (none of the above)
there is one single candidate: Clojure. Clojure is a Lisp dialect which runs on the JVM and has nice concurrency features like inmutability and STM and agents. And by the way, OTB bindings work like a charm: Admittedly, the syntax is less beautiful than Python's or Scala's, but (because!) it's a Lisp. And it's a better Java than Java. So you have all the Java libs available, and even Clojure specific repositories like Clojars. A particular interesting project is Incanter which provides is a Clojure-based, R-like platform for statistical computing and graphics. Have a look at this presentation to get an overview of what you can do at the Clojure repl with that. If we bear in mind that in Lisp code is data and that Lisp macros are mega-powerful, one could imagine writing things like:
(make-otb-pipeline reader gradientFilter thresholdFilter writer)
Or even emulating the C++ template syntax to avoid using the mangled names of the OTB classes in the Java bindings (using macros and keywords):
(def filter (RescaleIntensityImageFilter :itk (Image. : otb :Float 2) (Image. : otb :UnsignedChar 2)))
instead of
(def filter (itkRescaleIntensityImageFilterIF2IUC2.))
I have already found a cool syntax for using the Setters and Getters of
a filter using the doto
macro (see line 19 in the example below):
Conclusion
I am going to push further the investigation of the use of Clojure because is seems to fit my needs:
- Has an interactive interpreter
- Access to OTB (through the Java bindings)
- Concurrency primitives (agents, STM, etc.)
- It's a lisp, so I can easily port existing rule-based expert systems.
Given the fact that this would be the sum of many cool features, I think I should call it Σ Toolbox, but I don't like the name. The mix of λ calculus and latex Ω Toolbox, should be called λΩλ, which is LoL in Greek.