Why Astronomers Should Program in Python

by eric

The training and career outcomes of astronomy students make Python the current best-choice language for new development and analysis scripting.  Two realities about academic astronomy allow us to evaluate the success of language choices, and Python is a clear winner.

  1. Astronomers do not receive any formal training in programming, computer science, or “software carpentry.” While practicing astronomers spend significant time writing code for analysis, few undergraduate or graduate programs require even a semester of instruction in basic programming. A 2008 survey of scientists found that while they spent 30% of their time developing software, 97% are self-taught programmers [Hannay et al. 2009].  (The situation is unfortunately quite similar with statistics.)  Instead, programming knowledge is passed down informally within research groups, limiting the development of true expertise.
    Astronomers need a language which is beginner-friendly, yet powerful.
  2. Most astronomy students do not continue into long-term careers as astronomers. While most astronomy PhDs can obtain postdoctoral positions, the number of permanent academic positions available is far lower.
    Students’ career prospects outside of astronomy will be improved if they have experience with a language used in other fields.

Of course, the language of choice should also enable the best science possible under time and cost constraints. (These arguments apply for other fields of science to the degree that #1 and 2 are valid.)

Many astronomers currently program in IDL, a proprietary array-based language used mainly in astronomy, geophysics, and medical imaging.  Python is clearly better than IDL in terms of power and widespread adoption, but I believe its beginner-friendliness makes Python a better choice than other potential “primary” languages.

Some of Python’s advantages:

  • It’s beginner-friendly. Python code is usually straightforward to read, and the language goals focus on clarity.  As an interpreted language, students can learn the language syntax interactively without waiting for compilation, greatly speeding the learning process.  Basic tutorials are widely available for free on the web, and many questions have answers a quick Google or Stack Overflow search away.  Python is free of the memory allocation problems and tricky pointer arithmetic one encounters in C or C++ that can confound the beginner.   Finally, as free software, it’s straightforward to install Python on one’s personal computer without the challenges of license files and authentication.
  • Python is a language to grow into. Despite being beginner friendly, Python is not a lightweight language.  Experience with Python exposes a student to techniques in object-oriented and functional programming styles and introduces a variety of data structures.  Debugging tools and unit testing frameworks are readily available.
  • With “batteries included,” Python enables powerful analyses. The Python ecosystem is enormous, including both native libraries and interface layers to other packages which allow scientists to leverage others’ work.  With libraries for array manipulation, scientific programming, 3D and 2D plotting, numerical methods, web programming, database integration, interfaces to C and FORTRAN code, GUI programming, symbolic math, machine learning, MCMC, network analysis, and much more, the possibilities are limited by one’s imagination.
  • Python is widely used in science and industry. The main Python site provides many examples.

Obviously, Python is not the right choice in all circumstances.  If an analysis needs to interface with a large body of existing code, it generally makes sense to work in that language.  Similarly, performance constraints may require a compiled language in some cases.  Senior astronomers driven by deadlines will want to stick with languages they already know rather than fight with a new one.  In general, though, astronomy would be well-served if most new code were written in Python.

While it is easier to train new students using a language one knows, for the reasons above new students should be encouraged to learn Python, even when that creates an “impedance mismatch” between the advisor and the student.  Astronomical use of Python is growing rapidly, and groups fluent in it will have an advantage.

Astronomy-specific Python code and guides are proliferating.  The CfA has a tutorial, and there are other resources here, here, here, and here.  Python interfaces to key tools like IRAF, ds9, and FITS exist, as do translation guides for the IDL Astronomy Library.

Better training and career guidance for students of astronomy are important long-range goals for astronomy departments.  Here and now, astronomy students can individually improve their prospects and their science by programming in Python.

Update, 6/1: following up some comments.

 

No tool is perfect.  My main frustrations these days: package installation can be nontrivial (although the Scipy Superpack and the EPD are helpful); it’s hard to get answers from the convoluted matplotlib documentation; and the transition to Python 3 promises to be a headache.