Posts tagged "linux":

25 Mar 2015

Scripting in C++

Introduction

This quote of B. Klemens¹ that I used in my previous post made me think a little bit:

"I spent much of my life ignoring the fundamentals of computing and just hacking together projects using the package or language of the month: C++, Mathematica, Octave, Perl, Python, Java, Scheme, S-PLUS, Stata, R, and probably a few others that I’ve forgotten."

Although much less skilled than M. Klemens, I have also wandered around the programming language landscape, out of curiosity, but also with particular needs in mind.

For example, I listed a set of features that I found useful for easily going from prototyping to operational code for remote sensing image processing. The main 3 bullet points were:

OTB accessibility, which meant C++, Python or Java (and its derivatives as Clojure);
Robust concurrent programming, which ruled out Python;
Having a REPL, which led me to say silly things and even develop working code.

Before going the OTB-from-LISP-through-bindings route, I had looked for C++ interpreters to come close to a REPL, some are available, but, at least at the time, I didn't manage to get one running.

Since I started using C++11 a little bit, I find more and more pleasant to write C++ even for small things that I would usually use Python for. Add to that all the new things in the standard library, like threads, regular expressions, tuples and proper random number generators, just to name a few, and here I am wondering how to make the edit/compile/run loop shorter.

B. Klemens proposes crazy things like compiling via cut/paste, but this is too extreme even for me. Since I use emacs, I can compile from within the editor just by hitting C-c C-c which asks for a command to compile. I usually do things like:

export ITK_AUTOLOAD_PATH="" && cd ~/Dev/builds/otb-bv/ && make \
    && ctest -I 8,8 -VV && gnuplot ~/Dev/otb-bv/data/plot_bvMultiT.gpl

I am nearly there, but not yet.

Remember that I discovered Klemens because I was looking for statistical libs after watching some videos of the Coursera Data Science specialization? One of the courses is about reproducible research and they show how you can mix together R code and Markdown to produce executable research papers.

When I saw this R Markdown approach, I thought, well, that's nice, but inferior to org-babel. The main advantage of org-babel with respect to R Markdown or IPython is that in org-mode you can use different languages in the same document.

In a recent research report written in org-mode and exported to PDF, I used, in the same document, Python, bash, gnuplot and maxima. This is a real use case. Of course, these are all interpreted languages, so it is easy to embed them in documents and talk to the interpreter.

Well, 15 years after I started using emacs and more than 6 years after discovering org-mode, I found that compiled languages can be used with org-babel. At least, decent compiled languages, since there is no mention of Objective-C or Java².

Setting up and testing

So how one does use C++ with org-babel? Just drop this into your emacs configuration:

Yes, don't forget the flag for C++11, since you want to be cool.

Now, let the fun begin. Open an org buffer and write:

#+begin_src C++ :includes <iostream>
std::cout << "Hello scripting world\n";
#+end_src

Go inside the code block and hit C-c C-c and you will magically see this appear just below the code block:

#+RESULTS:
: Hello scripting world

No problem. You are welcome. Good-bye.

If you want to go beyond the Hello world example keep reading.

Using the standard library

You have noticed that the first line of the code block says it is C++ and gives information about the header files to include. If you need to include more than one header file³, just make a list of them (remember, emacs is a LISP machine):

#+begin_src C++ :includes (list "<iostream>" "<vector>")
std::vector<int> v{1,2,3};
for(auto& val : v) std::cout << val << "\n";
#+end_src

#+RESULTS:
| 1 |
| 2 |
| 3 |

Do you see the nice formatted results? This is an org-mode table. More fun with the standard library and tables:

#+begin_src C++ :includes (list "<iostream>" "<string>" "<map>")
std::map<std::string, int> m{{"pizza",1},{"muffin",2},{"cake",3}};
for(auto& val : m) std::cout << val.first << " " << val.second << "\n";
#+end_src

#+RESULTS:
| cake   | 3 |
| muffin | 2 |
| pizza  | 1 |

Using other libraries

Sometimes, you may want to use other libraries than the C++ standard one⁴. Just give the header files as above and also the flags (here I split the header of the org-babel block in 2 lines for readability):

#+header: :includes (list "<gsl/gsl_fit.h>" "<stdio.h>") 
#+begin_src C++ :exports both :flags (list "-lgsl" "-lgslcblas")
int i, n = 4;
double x[4] = { 1970, 1980, 1990, 2000 };
double y[4] = {   12,   11,   14,   13 };
double w[4] = {  0.1,  0.2,  0.3,  0.4 };

double c0, c1, cov00, cov01, cov11, chisq;

gsl_fit_wlinear (x, 1, w, 1, y, 1, n, 
                 &c0, &c1, &cov00, &cov01, &cov11, 
                 &chisq);

printf ("# best fit: Y = %g + %g X\n", c0, c1);
printf ("# covariance matrix:\n");
printf ("# [ %g, %g\n#   %g, %g]\n", 
        cov00, cov01, cov01, cov11);
printf ("# chisq = %g\n", chisq);

for (i = 0; i < n; i++)
  printf ("data: %g %g %g\n", 
          x[i], y[i], 1/sqrt(w[i]));

printf ("\n");

for (i = -30; i < 130; i++)
  {
  double xf = x[0] + (i/100.0) * (x[n-1] - x[0]);
  double yf, yf_err;

  gsl_fit_linear_est (xf, 
                      c0, c1, 
                      cov00, cov01, cov11, 
                      &yf, &yf_err);

  printf ("fit: %g %g\n", xf, yf);
  printf ("hi : %g %g\n", xf, yf + yf_err);
  printf ("lo : %g %g\n", xf, yf - yf_err);
  }
#+end_src

#+RESULTS:
| #     | best       |    fit: |       Y | = | -106.6 | + | 0.06 | X |
| #     | covariance | matrix: |         |   |        |   |      |   |
| #     | [          |  39602, |   -19.9 |   |        |   |      |   |
| #     | -19.9,     |   0.01] |         |   |        |   |      |   |
| #     | chisq      |       = |     0.8 |   |        |   |      |   |

etc.

Defining things outside main

You may have noticed that we are really scripting like perl hackers, since there is no main() function in our code snippets. org-babel kindly generates all the boilerplate for us. Thank you.

But wait, what happens if I want to define or declare things outside main?

No problem, just tell org-babel not to generate a main():

#+begin_src C++ :includes <iostream> :main no
struct myStr{int i;};

int main()
{
myStr ms{2};
std::cout << "The value of the member is " << ms.i << std::endl;
}
#+end_src

#+RESULTS:
: The value of the member is 2

There are many other options you can play with and I suggest to read the documentation. Just 2 more useful things.

Using org-mode tables as data

You saw that org-mode has tables. They are very powerful and can even be used as a spreadsheet. Let's see how they can be used as input to C++ code.

Let's start by giving a name to our table:

#+tblname: somedata
|  sqr | noise |
|------+-------|
|  0.0 |  0.23 |
|  1.0 |  1.31 |
|  4.0 |  4.61 |
|  9.0 |  9.05 |
| 16.0 | 16.55 |

Now, we can refer to it from org-babel using variables:

#+header: :exports results :includes <iostream>
#+begin_src C++ :var somedata=somedata :var somedata_cols=2 :var somedata_rows=5
for (int i=0; i<somedata_rows; i++)
  {
  for (int j=0; j<somedata_cols; j++) 
    {
    std::cout << somedata[i][j]/2.0 << "\t";
    }
  std::cout << "\n";
  }
#+end_src

#+RESULTS:
|   0 | 0.115 |
| 0.5 | 0.655 |
|   2 | 2.305 |
| 4.5 | 4.525 |
|   8 | 8.275 |

Tangle when happy

Once you have scripted your solution to save the world from hunger⁵, you may want to get it out from org-mode and upload it to github⁶. You can of course copy and paste the code, but since you are using a superior editor, you can ask it to do that⁷ for you. Just tell it the name of your file using the tangle option:

#+begin_src C++ :tangle MyApp.cpp :main no :includes <iostream>
namespace ns {
    class MyClass {};
}

int main(int argc, char* argv[])
{
ns::MyClass{};
std::cout << "hi" << std::endl;
return 0;
}
#+end_src

If you the call org-babel-tangle from this block, it will generate a file with the right name containing this:

#include <iostream>
namespace ns {
    class MyClass {};
}

int main(int argc, char* argv[])
{
ns::MyClass{};
std::cout << "hi" << std::endl;
return 0;
}

Conclusion

Since reproducible research in emacs has entered the realm of scientific journals⁸, I will end as follows:

In this article, we have shown the feasibility of coding in C++ without writing much boilerplate and going through the edit/compile/run loop. With the advent of modern C++ standards and richer standard libraries, C++ is a good candidate to quickly write throw-away code. The integration of C++ in org-babel gets C++ even closer to the tasks for which scripting languages are used for.

Future work will include learning to use the C++ standard library (data structures and algorithms), browsing the catalog of BOOST libraries, trying to get good C++ developers to use emacs and C++11, and find a generic variadic-template-based solution to world hunger.

Footnotes:

Ben Klemens, Modeling with Data: Tools and Techniques for Scientific Computing, Princeton University Press, 2009, ISBN: 9780691133140.

Well, this is just trolling, since Java is in the list of supported languages, and I guess that you can use gcc for Objective-C.

This may happen if you are writing code for a nuclear power plant.

⁴

To guide nuclear missiles towards the nuclear power plant of the evil guys, for instance.

⁵

Just send some nuclear bombs to those who are hungry?

⁶

Isn't that cool, to be on the same cloud as all those Javascript programmers?

⁷

But tell it to make coffee first.

⁸

Eric Schulte, Dan Davison, Tom Dye, Carsten Dominik. A Multi-Language Computing Environment for Literate Programming and Reproducible Research Journal of Statistical Software http://www.jstatsoft.org/v46/i03

18 Mar 2015

Data science in C?

Coursera is offering a Data Science specialization taught by professors from Johns Hopkins. I discovered it via one of their courses which is about reproducible research. I have been watching some of the video lectures and they are very interesting, since they combine data processing, programming and statistics.

The courses use the R language which is free software and is one the reference tools in the field of statistics, but it is also very much used for other kinds of data analysis.

While I was watching some of the lectures, I had some ideas to be implemented in some of my tools. Although I have used GSL and VXL¹ for linear algebra and optimization, I have never really needed statistics libraries in C or C++, so I ducked a bit and found the apophenia library², which builds on top of GSL, SQLite and Glib to provide very useful tools and data structures to do statistics in C.

Browsing a little bit more, I found that the author of apophenia has written a book "Modeling with data"³, which teaches you statistics like many books about R, but using C, SQLite and gnuplot.

This is the kind of technical book that I like most: good math and code, no fluff, just stuff!

The author sets the stage from the foreword. An example from page xii (the emphasis is mine):

" The politics of software

All of the software in this book is free software, meaning that it may be freely downloaded and distributed. This is because the book focuses on portability and replicability, and if you need to purchase a license every time you switch computers, then the code is not portable. If you redistribute a functioning program that you wrote based on the GSL or Apophenia, then you need to redistribute both the compiled final program and the source code you used to write the program. If you are publishing an academic work, you should be doing this anyway. If you are in a situation where you will distribute only the output of an analysis, there are no obligations at all. This book is also reliant on POSIX-compliant systems, because such systems were built from the ground up for writing and running replicable and portable projects. This does not exclude any current operating system (OS): current members of the Microsoft Windows family of OSes claim POSIX compliance, as do all OSes ending in X (Mac OS X, Linux, UNIX,…)."

Of course, the author knows the usual complaints about programming in C (or C++ for that matter) and spends many pages explaining his choice:

"I spent much of my life ignoring the fundamentals of computing and just hacking together projects using the package or language of the month: C++, Mathematica, Octave, Perl, Python, Java, Scheme, S-PLUS, Stata, R, and probably a few others that I’ve forgotten. Albee (1960, p 30)⁴ explains that “sometimes it’s necessary to go a long distance out of the way in order to come back a short distance correctly;” this is the distance I’ve gone to arrive at writing a book on data-oriented computing using a general and basic computing language. For the purpose of modeling with data, I have found C to be an easier and more pleasant language than the purpose-built alternatives—especially after I worked out that I could ignore much of the advice from books written in the 1980s and apply the techniques I learned from the scripting languages."

The author explains that C is a very simple language:

" Simplicity

C is a super-simple language. Its syntax has no special tricks for polymorphic operators, abstract classes, virtual inheritance, lexical scoping, lambda expressions, or other such arcana, meaning that you have less to learn. Those features are certainly helpful in their place, but without them C has already proven to be sufficient for writing some impressive programs, like the Mac and Linux operating systems and most of the stats packages listed above."

And he makes it really simple, since he actually teaches you C in one chapter of 50 pages (and 50 pages counting source code is not that much!). He does not teach you all C, though:

"As for the syntax of C, this chapter will cover only a subset. C has 32 keywords and this book will only use 18 of them."

At one point in the introduction I worried about the author bashing C++, which I like very much, but he actually gives a good explanation of the differences between C and C++ (emphasis and footnotes are mine):

"This is the appropriate time to answer a common intro-to-C question: What is the difference between C and C++? There is much confusion due to the almost-compatible syntax and similar name—when explaining the name C-double-plus⁵, the language’s author references the Newspeak language used in George Orwell’s 1984 (Orwell, 1949⁶; Stroustrup, 1986, p 4⁷). The key difference is that C++ adds a second scope paradigm on top of C’s file- and function-based scope: object-oriented scope. In this system, functions are bound to objects, where an object is effectively a struct holding several variables and functions. Variables that are private to the object are in scope only for functions bound to the object, while those that are public are in scope whenever the object itself is in scope. In C, think of one file as an object: all variables declared inside the file are private, and all those declared in a header file are public. Only those functions that have a declaration in the header file can be called outside of the file. But the real difference between C and C++ is in philosophy: C++ is intended to allow for the mixing of various styles of programming, of which object-oriented coding is one. C++ therefore includes a number of other features, such as yet another type of scope called namespaces, templates and other tools for representing more abstract structures, and a large standard library of templates. Thus, C represents a philosophy of keeping the language as simple and unchanging as possible, even if it means passing up on useful additions; C++ represents an all-inclusive philosophy, choosing additional features and conveniences over parsimony."

It is actually funny that I find myself using less and less class inheritance and leaning towards small functions (often templates) and when I use classes, it is usually to create functors. This is certainly due to the influence of the Algorithmic journeys of Alex Stepanov.

Footnotes:

Actually the numerics module, vnl.

Apophenia is the experience of perceiving patterns or connections in random or meaningless data.

Ben Klemens, Modeling with Data: Tools and Techniques for Scientific Computing, Princeton University Press, 2009, ISBN: 9780691133140.

⁴

Albee, Edward. 1960. The American Dream and Zoo Story. Signet.

⁵

See http://en.wikipedia.org/wiki/List_of_Newspeak_words#Prefixes

⁶

Orwell, George. 1949. 1984. Secker and Warburg.

⁷

Stroustrup, Bjarne. 1986. The C++ Programming Language. Addison-Wesley.

27 Oct 2014

xdg-open or "computer, do what I mean"

Command line nerds¹ say that graphical user interfaces are less efficient since the former use the keyboard and the latter tend to need the use of the mouse.

Graphical user interface fans say that command line based work-flows need the user to know more things about the system. Although I don't see why this would be an issue (what's the problem with getting to know better the system you use every day?), I can see one clear advantage to the point-and-click work-flow.

When the user wants to open a file, double-clicking on it just works. On the other hand, on the command line, the user needs to call the appropriate application and pass the file name as argument:

evince my_document.pdf

That means that the user has to know which is the application that is going to correctly deal with the given file. For example, this won't work:

evince my_movie.mp4

because evince is a document viewer and it only understands formats as pdf, postscript, dejavu or dvi, but it is not able to understand MPEG-4 videos.

So how can one replicate the point-and-click behavior on the command line? It would be nice to have something like:

open my_file

and that auto-magically the appropriate application was chosen.

It is of course possible, and it can be done in the same way as the graphical desktop environment does it: using xdg-utils.

Inside a desktop environment (e.g. GNOME, KDE, or Xfce), xdg-open simply passes the arguments to that desktop environment's file-opener application (gvfs-open, kde-open, or exo-open, respectively), which means that the associations are left up to the desktop environment. Therefore, on the command line, one can do:

xdg-open my_file.pdf

and the appropriate application will be used (evince on GNOME, okular on KDE, etc.).

When no desktop environment is detected (for example when one runs a standalone window manager, e.g. stumpwm), xdg-open will use its own configuration files.

Sometimes, the default associations between applications and file types may not suit the user. For instance, in my case, MPEG-4 videos were open with mplayer, but I prefer vlc. The xdg-mime tool is meant to help you change the default associations:

xdg-mime default vlc.desktop video/mp4

The vlc.desktop parameter is an xdg desktop file which describes the vlc applications. On my Debian GNU/Linux system, this files are located in /usr/share/applications. So I was able to look in there to see what applications xdg-open could use.

The parameter video/mp4 passed to xdg-mime is the type of file we want to associate the application with. In order to know what type of file we are dealing with, xdg-mime can query it for you:

xdg-mime query my_movie.mp4

And the answer is:

video/mp4

Now, what happens if there is no desktop file for an application I want to use? For instance, remote sensing image visualization is not possible with the standard image viewers available on GNU/Linux. I personally use the OTB IceViewer, which is a lightweight application using the rendering engine developed for Monteverdi2.

In this case, I have to create a desktop file for the application. I my case, I have created the otbice.desktop file in /home/inglada/.local/share/applications/ with the following contents:

[Desktop Entry]
Type=Application
Name=OTB Ice Viewer
Exec=/home/inglada/OTB/builds/Ice/bin/otbiceviewer %U
Icon=otb-logo
Terminal=false
Categories=Graphics;2DGraphics;RasterGraphics;
MimeType=image/bmp;image/gif;image/x-portable-bitmap;image/x-portable-graymap;image/x-portable-pixmap;image/x-xbitmap;image/tiff;image/jpeg;image/png;image/x-xpixmap;image/jp2;image/jpeg2000;

The file has to be executable, so I have to do:

chmod +x /home/inglada/.local/share/applications/otbice.desktop

After that, I can associate the IceViewer with the image formats I want:

xdg-mime default otbice.desktop image/tiff

And it just works.

Footnotes:

And I am one. See here, here and here for example.

09 Aug 2013

Watching YouTube without a browser

The other day I was curious about how many streaming video sites among those I usually visit had made the move from flash to HTML5.

The main reason is that my virtual Richard Stallman keeps telling me that I am evil, so I wanted to remove the flash plug-in from my machine.

An the fact is that many sites have not switched yet. As far as I understand, YouTube has, at least partially. Therefore, the title of this post is somewhat misleading.

I have been using youtube-dl for a while now in order to download videos from many sites as YouTube, Vimeo, etc. My first idea was therefore to pipe the output of youtube-dl to a video player supporting the appropriate format. Since I usually use vlc I just tried:

youtube-dl  -o - <url of the video> | vlc -

This works fine, but has the drawback of needing to manually type (or copy-paste) the URL of the video. So the command line on a terminal emulator is not what I find convenient.

Most of the video URLs I access come to me via e-mail, and I read e-mail with Gnus inside Emacs, I decided that an Emacs command would do the trick. So I put this inside my .emacs:

(defun play-youtube-video (url)
  
  (interactive "sURL: ")
  
  (shell-command
   
   (concat "youtube-dl  -o - " url " | vlc -")))

This way, M-x play-youtube-video, asks me for an URL and plays the video. So, when I have an URL for a video on an e-mail (or any other text document on Emacs) I just copy the URL with the appropriate Emacs key stroke and call this command. For convenience, I have mapped it to key binding:

(global-set-key (kbd "<f9> Y") 'play-youtube-video)

So this is good, though not yet frictionless.

Sometimes, the video URL will appear on an HTML buffer, either when I am browsing the web inside Emacs with w3m or when the e-mail I am reading is sent in HTML¹. In this case, I don't need to manually copy the URL and I just can ask w3m to do the work.

(defun w3m-play-youtube-video ()

  (interactive)

  (play-youtube-video

   (w3m-print-this-url (point))))


(global-set-key (kbd "<f9> y") 'w3m-play-youtube-video)

So far, this has been working fine for me and I don't have the ugly non-free flash plug-in installed, and VRMS is happier than before.

Footnotes:

I use w3m as HTML renderer for Gnus with (setq mm-text-html-renderer 'w3m).