25 Mar 2015

Scripting in C++

Introduction

This quote of B. Klemens¹ that I used in my previous post made me think a little bit:

"I spent much of my life ignoring the fundamentals of computing and just hacking together projects using the package or language of the month: C++, Mathematica, Octave, Perl, Python, Java, Scheme, S-PLUS, Stata, R, and probably a few others that I’ve forgotten."

Although much less skilled than M. Klemens, I have also wandered around the programming language landscape, out of curiosity, but also with particular needs in mind.

For example, I listed a set of features that I found useful for easily going from prototyping to operational code for remote sensing image processing. The main 3 bullet points were:

OTB accessibility, which meant C++, Python or Java (and its derivatives as Clojure);
Robust concurrent programming, which ruled out Python;
Having a REPL, which led me to say silly things and even develop working code.

Before going the OTB-from-LISP-through-bindings route, I had looked for C++ interpreters to come close to a REPL, some are available, but, at least at the time, I didn't manage to get one running.

Since I started using C++11 a little bit, I find more and more pleasant to write C++ even for small things that I would usually use Python for. Add to that all the new things in the standard library, like threads, regular expressions, tuples and proper random number generators, just to name a few, and here I am wondering how to make the edit/compile/run loop shorter.

B. Klemens proposes crazy things like compiling via cut/paste, but this is too extreme even for me. Since I use emacs, I can compile from within the editor just by hitting C-c C-c which asks for a command to compile. I usually do things like:

export ITK_AUTOLOAD_PATH="" && cd ~/Dev/builds/otb-bv/ && make \
    && ctest -I 8,8 -VV && gnuplot ~/Dev/otb-bv/data/plot_bvMultiT.gpl

I am nearly there, but not yet.

Remember that I discovered Klemens because I was looking for statistical libs after watching some videos of the Coursera Data Science specialization? One of the courses is about reproducible research and they show how you can mix together R code and Markdown to produce executable research papers.

When I saw this R Markdown approach, I thought, well, that's nice, but inferior to org-babel. The main advantage of org-babel with respect to R Markdown or IPython is that in org-mode you can use different languages in the same document.

In a recent research report written in org-mode and exported to PDF, I used, in the same document, Python, bash, gnuplot and maxima. This is a real use case. Of course, these are all interpreted languages, so it is easy to embed them in documents and talk to the interpreter.

Well, 15 years after I started using emacs and more than 6 years after discovering org-mode, I found that compiled languages can be used with org-babel. At least, decent compiled languages, since there is no mention of Objective-C or Java².

Setting up and testing

So how one does use C++ with org-babel? Just drop this into your emacs configuration:

Yes, don't forget the flag for C++11, since you want to be cool.

Now, let the fun begin. Open an org buffer and write:

#+begin_src C++ :includes <iostream>
std::cout << "Hello scripting world\n";
#+end_src

Go inside the code block and hit C-c C-c and you will magically see this appear just below the code block:

#+RESULTS:
: Hello scripting world

No problem. You are welcome. Good-bye.

If you want to go beyond the Hello world example keep reading.

Using the standard library

You have noticed that the first line of the code block says it is C++ and gives information about the header files to include. If you need to include more than one header file³, just make a list of them (remember, emacs is a LISP machine):

#+begin_src C++ :includes (list "<iostream>" "<vector>")
std::vector<int> v{1,2,3};
for(auto& val : v) std::cout << val << "\n";
#+end_src

#+RESULTS:
| 1 |
| 2 |
| 3 |

Do you see the nice formatted results? This is an org-mode table. More fun with the standard library and tables:

#+begin_src C++ :includes (list "<iostream>" "<string>" "<map>")
std::map<std::string, int> m{{"pizza",1},{"muffin",2},{"cake",3}};
for(auto& val : m) std::cout << val.first << " " << val.second << "\n";
#+end_src

#+RESULTS:
| cake   | 3 |
| muffin | 2 |
| pizza  | 1 |

Using other libraries

Sometimes, you may want to use other libraries than the C++ standard one⁴. Just give the header files as above and also the flags (here I split the header of the org-babel block in 2 lines for readability):

#+header: :includes (list "<gsl/gsl_fit.h>" "<stdio.h>") 
#+begin_src C++ :exports both :flags (list "-lgsl" "-lgslcblas")
int i, n = 4;
double x[4] = { 1970, 1980, 1990, 2000 };
double y[4] = {   12,   11,   14,   13 };
double w[4] = {  0.1,  0.2,  0.3,  0.4 };

double c0, c1, cov00, cov01, cov11, chisq;

gsl_fit_wlinear (x, 1, w, 1, y, 1, n, 
                 &c0, &c1, &cov00, &cov01, &cov11, 
                 &chisq);

printf ("# best fit: Y = %g + %g X\n", c0, c1);
printf ("# covariance matrix:\n");
printf ("# [ %g, %g\n#   %g, %g]\n", 
        cov00, cov01, cov01, cov11);
printf ("# chisq = %g\n", chisq);

for (i = 0; i < n; i++)
  printf ("data: %g %g %g\n", 
          x[i], y[i], 1/sqrt(w[i]));

printf ("\n");

for (i = -30; i < 130; i++)
  {
  double xf = x[0] + (i/100.0) * (x[n-1] - x[0]);
  double yf, yf_err;

  gsl_fit_linear_est (xf, 
                      c0, c1, 
                      cov00, cov01, cov11, 
                      &yf, &yf_err);

  printf ("fit: %g %g\n", xf, yf);
  printf ("hi : %g %g\n", xf, yf + yf_err);
  printf ("lo : %g %g\n", xf, yf - yf_err);
  }
#+end_src

#+RESULTS:
| #     | best       |    fit: |       Y | = | -106.6 | + | 0.06 | X |
| #     | covariance | matrix: |         |   |        |   |      |   |
| #     | [          |  39602, |   -19.9 |   |        |   |      |   |
| #     | -19.9,     |   0.01] |         |   |        |   |      |   |
| #     | chisq      |       = |     0.8 |   |        |   |      |   |

etc.

Defining things outside main

You may have noticed that we are really scripting like perl hackers, since there is no main() function in our code snippets. org-babel kindly generates all the boilerplate for us. Thank you.

But wait, what happens if I want to define or declare things outside main?

No problem, just tell org-babel not to generate a main():

#+begin_src C++ :includes <iostream> :main no
struct myStr{int i;};

int main()
{
myStr ms{2};
std::cout << "The value of the member is " << ms.i << std::endl;
}
#+end_src

#+RESULTS:
: The value of the member is 2

There are many other options you can play with and I suggest to read the documentation. Just 2 more useful things.

Using org-mode tables as data

You saw that org-mode has tables. They are very powerful and can even be used as a spreadsheet. Let's see how they can be used as input to C++ code.

Let's start by giving a name to our table:

#+tblname: somedata
|  sqr | noise |
|------+-------|
|  0.0 |  0.23 |
|  1.0 |  1.31 |
|  4.0 |  4.61 |
|  9.0 |  9.05 |
| 16.0 | 16.55 |

Now, we can refer to it from org-babel using variables:

#+header: :exports results :includes <iostream>
#+begin_src C++ :var somedata=somedata :var somedata_cols=2 :var somedata_rows=5
for (int i=0; i<somedata_rows; i++)
  {
  for (int j=0; j<somedata_cols; j++) 
    {
    std::cout << somedata[i][j]/2.0 << "\t";
    }
  std::cout << "\n";
  }
#+end_src

#+RESULTS:
|   0 | 0.115 |
| 0.5 | 0.655 |
|   2 | 2.305 |
| 4.5 | 4.525 |
|   8 | 8.275 |

Tangle when happy

Once you have scripted your solution to save the world from hunger⁵, you may want to get it out from org-mode and upload it to github⁶. You can of course copy and paste the code, but since you are using a superior editor, you can ask it to do that⁷ for you. Just tell it the name of your file using the tangle option:

#+begin_src C++ :tangle MyApp.cpp :main no :includes <iostream>
namespace ns {
    class MyClass {};
}

int main(int argc, char* argv[])
{
ns::MyClass{};
std::cout << "hi" << std::endl;
return 0;
}
#+end_src

If you the call org-babel-tangle from this block, it will generate a file with the right name containing this:

#include <iostream>
namespace ns {
    class MyClass {};
}

int main(int argc, char* argv[])
{
ns::MyClass{};
std::cout << "hi" << std::endl;
return 0;
}

Conclusion

Since reproducible research in emacs has entered the realm of scientific journals⁸, I will end as follows:

In this article, we have shown the feasibility of coding in C++ without writing much boilerplate and going through the edit/compile/run loop. With the advent of modern C++ standards and richer standard libraries, C++ is a good candidate to quickly write throw-away code. The integration of C++ in org-babel gets C++ even closer to the tasks for which scripting languages are used for.

Future work will include learning to use the C++ standard library (data structures and algorithms), browsing the catalog of BOOST libraries, trying to get good C++ developers to use emacs and C++11, and find a generic variadic-template-based solution to world hunger.

Footnotes:

Ben Klemens, Modeling with Data: Tools and Techniques for Scientific Computing, Princeton University Press, 2009, ISBN: 9780691133140.

Well, this is just trolling, since Java is in the list of supported languages, and I guess that you can use gcc for Objective-C.

This may happen if you are writing code for a nuclear power plant.

⁴

To guide nuclear missiles towards the nuclear power plant of the evil guys, for instance.

⁵

Just send some nuclear bombs to those who are hungry?

⁶

Isn't that cool, to be on the same cloud as all those Javascript programmers?

⁷

But tell it to make coffee first.

⁸

Eric Schulte, Dan Davison, Tom Dye, Carsten Dominik. A Multi-Language Computing Environment for Literate Programming and Reproducible Research Journal of Statistical Software http://www.jstatsoft.org/v46/i03