Best Practices for Scientific Computing

Shantanu and I gave a short talk titled "Software Carpentry for Scientists" for the graduate students of Chemical Engineering department, IISc, this Friday. We gave a short introduction to Git, TDD, Numpy/Scipy, etc and mentioned a few things from Greg Wilson et al's paper.

I promised to revert to them with links to a few resources. I figured it would be more beneficial, if I just put it in a publicly available place.

A summary of the paper by Greg Wilson et. al., is below.

Summary of paper by Greg Wilson et. al.

  1. Write programs for people, not computers
    • a program should not require its readers to hold more than a handful of facts in memory at once.
    • names should be consistent, distinctive and meaningful.
    • code style and formatting should be consistent.
    • all aspects of software development should be broken down into tasks roughly an hour long
  2. Automate repetitive tasks
    • rely on the computer to repeat tasks
    • save recent commands in a file for re-use
    • use a build to automate scientific work-flows
  3. Use the computer to record history
    • software tools should be used to track computational work automatically.
  4. Make incremental changes
    • work in small steps with frequent feedback and course correction
  5. Use version control
    • use a version control system
    • everything that has been created manually should be put in version control
  6. Don't repeat yourself (or others)
    • every piece of data must have a single authoritative representation in the system
    • code should be modularized rather than copied and pasted
    • re-use code instead of rewriting it
  7. Plan for mistakes
    • add assertions to programs to check their operation
    • use an off-the-shelf unit testing library
    • use all available oracles when testing programs
    • turn bugs into test cases
    • use a symbolic debugger
  8. Optimize software only after it works correctly
    • use a profiler to identify bottlenecks
    • write code in the highest-level language possible
  9. Document design and purpose, not mechanics
    • document interfaces and reasons, not implementations
    • refactor code instead of explaining how it works
    • embed the documentation for a piece of software in that software
  10. Collaborate
    • use pre-merge code reviews
    • use pair programming when bringing someone new up to speed and when tackling particularly tricky problems
    • use an issue tracking tool

3 tips for those shipping (commercial) apps

Here are some very generic (and paraphrased) notes from a short talk today, by Deepankar Sharma.

  1. Whenever you release a new major version, make sure you keep a copy of the whole "ecosystem" to be able to run it whenever you want. At any point in time, you should be able to run any version of your software.
  2. When writing benchmarks/tests/etc., try and ensure that you cover a broad spectrum of test data, to try and replicate the different types of data that users could possibly have.
  3. Don't develop applications with modes. Be very careful before you add a new mode to your application, effectively adding one more code path to maintain.
  4. (Bonus) Beware of too much extensibility

Advice - Programming in Elisp

Below is a mail sent by Eric Schulte to the org-mode mailing list answering a query on how to write elisp for org-mode. I am reproducing it here, since it is useful advice for me. The actual thread is here.


The way that I learned how to program in emacs lisp was mainly using two commands `elisp-index-search' bound to `C-h e' on my system, and most importantly `describe-function' bound to `C-h f'. With `describe-function' you can look at the source code of functions whose behavior you are familiar with, you can then copy portions of the code to your scratch buffer where they can be edited and evaluated with `eval-defun' bound to `C-M-x'. Now with Babel, instead of doing this in the scratch buffer you could do this in emacs-lisp code blocks in an org file, enabling notes and hierarchical organization – it can be nice to have your noodling all collected in one file for later reference.

If you are going to do any serious work with lisp, I would emphatically recommend using paredit-mode, and becoming friends with the Sexp movement functions

C-M-f runs the command paredit-forward C-M-b runs the command paredit-backward C-M-u runs the command backward-up-list C-M-k runs the command kill-sexp C-y runs the command yank

They allow you to manipulate lisp code on the level of logical expressions, the utility of which can not be over stated.

As for working with Org-mode in particular, I'd recommend looking at the documentation and source-code of Org-mode functions with `describe-function', and then looking for how these functions are actually used in the Org-mode code base with `rgrep'.

For a more structured learning experience, I've heard very good things about http://www.gnu.org/software/emacs/emacs-lisp-intro/, although I haven't used it myself.

Hope this helps. Happy Hacking – Eric