Best Practices for Scientific Computing

24 Dec 2012 | advice programming science software

Shantanu and I gave a short talk titled “Software Carpentry for Scientists” for the graduate students of Chemical Engineering department, IISc, this Friday. We gave a short introduction to Git, TDD, Numpy/Scipy, etc and mentioned a few things from Greg Wilson et al’s paper.

I promised to revert to them with links to a few resources. I figured it would be more beneficial, if I just put it in a publicly available place.

A summary of the paper by Greg Wilson et. al., is below.

Useful resources

Software Carpentry

Paper by Greg Wilson et. al.
Software Carpentry

Git & version control

TDD

SciPy

http://scipy-lectures.github.com

Python

http://docs.python.org/tutorial

GUI tools in Python

Summary of paper by Greg Wilson et. al.

Write programs for people, not computers
- a program should not require its readers to hold more than a handful of facts in memory at once.
- names should be consistent, distinctive and meaningful.
- code style and formatting should be consistent.
- all aspects of software development should be broken down into tasks roughly an hour long
Automate repetitive tasks
- rely on the computer to repeat tasks
- save recent commands in a file for re-use
- use a build to automate scientific work-flows
Use the computer to record history
- software tools should be used to track computational work automatically.
Make incremental changes
- work in small steps with frequent feedback and course correction
Use version control
- use a version control system
- everything that has been created manually should be put in version control
Don’t repeat yourself (or others)
- every piece of data must have a single authoritative representation in the system
- code should be modularized rather than copied and pasted
- re-use code instead of rewriting it
Plan for mistakes
- add assertions to programs to check their operation
- use an off-the-shelf unit testing library
- use all available oracles when testing programs
- turn bugs into test cases
- use a symbolic debugger
Optimize software only after it works correctly
- use a profiler to identify bottlenecks
- write code in the highest-level language possible
Document design and purpose, not mechanics
- document interfaces and reasons, not implementations
- refactor code instead of explaining how it works
- embed the documentation for a piece of software in that software
Collaborate
- use pre-merge code reviews
- use pair programming when bringing someone new up to speed and when tackling particularly tricky problems
- use an issue tracking tool