Why Astronomers Should Program in Python
by eric
The training and career outcomes of astronomy students make Python the current best-choice language for new development and analysis scripting. Two realities about academic astronomy allow us to evaluate the success of language choices, and Python is a clear winner.
- Astronomers do not receive any formal training in programming, computer science, or “software carpentry.” While practicing astronomers spend significant time writing code for analysis, few undergraduate or graduate programs require even a semester of instruction in basic programming. A 2008 survey of scientists found that while they spent 30% of their time developing software, 97% are self-taught programmers [Hannay et al. 2009]. (The situation is unfortunately quite similar with statistics.) Instead, programming knowledge is passed down informally within research groups, limiting the development of true expertise.
Astronomers need a language which is beginner-friendly, yet powerful. - Most astronomy students do not continue into long-term careers as astronomers. While most astronomy PhDs can obtain postdoctoral positions, the number of permanent academic positions available is far lower.
Students’ career prospects outside of astronomy will be improved if they have experience with a language used in other fields.
Of course, the language of choice should also enable the best science possible under time and cost constraints. (These arguments apply for other fields of science to the degree that #1 and 2 are valid.)
Many astronomers currently program in IDL, a proprietary array-based language used mainly in astronomy, geophysics, and medical imaging. Python is clearly better than IDL in terms of power and widespread adoption, but I believe its beginner-friendliness makes Python a better choice than other potential “primary” languages.
Some of Python’s advantages:
- It’s beginner-friendly. Python code is usually straightforward to read, and the language goals focus on clarity. As an interpreted language, students can learn the language syntax interactively without waiting for compilation, greatly speeding the learning process. Basic tutorials are widely available for free on the web, and many questions have answers a quick Google or Stack Overflow search away. Python is free of the memory allocation problems and tricky pointer arithmetic one encounters in C or C++ that can confound the beginner. Finally, as free software, it’s straightforward to install Python on one’s personal computer without the challenges of license files and authentication.
- Python is a language to grow into. Despite being beginner friendly, Python is not a lightweight language. Experience with Python exposes a student to techniques in object-oriented and functional programming styles and introduces a variety of data structures. Debugging tools and unit testing frameworks are readily available.
- With “batteries included,” Python enables powerful analyses. The Python ecosystem is enormous, including both native libraries and interface layers to other packages which allow scientists to leverage others’ work. With libraries for array manipulation, scientific programming, 3D and 2D plotting, numerical methods, web programming, database integration, interfaces to C and FORTRAN code, GUI programming, symbolic math, machine learning, MCMC, network analysis, and much more, the possibilities are limited by one’s imagination.
- Python is widely used in science and industry. The main Python site provides many examples.
Obviously, Python is not the right choice in all circumstances. If an analysis needs to interface with a large body of existing code, it generally makes sense to work in that language. Similarly, performance constraints may require a compiled language in some cases. Senior astronomers driven by deadlines will want to stick with languages they already know rather than fight with a new one. In general, though, astronomy would be well-served if most new code were written in Python.
While it is easier to train new students using a language one knows, for the reasons above new students should be encouraged to learn Python, even when that creates an “impedance mismatch” between the advisor and the student. Astronomical use of Python is growing rapidly, and groups fluent in it will have an advantage.
Astronomy-specific Python code and guides are proliferating. The CfA has a tutorial, and there are other resources here, here, here, and here. Python interfaces to key tools like IRAF, ds9, and FITS exist, as do translation guides for the IDL Astronomy Library.
Better training and career guidance for students of astronomy are important long-range goals for astronomy departments. Here and now, astronomy students can individually improve their prospects and their science by programming in Python.
Update, 6/1: following up some comments.
No tool is perfect. My main frustrations these days: package installation can be nontrivial (although the Scipy Superpack and the EPD are helpful); it’s hard to get answers from the convoluted matplotlib documentation; and the transition to Python 3 promises to be a headache.
case in point: The Cuckoo’s Egg by Cliff Stoll, a book about an astronomer hired as an IT guy at Berkley who tracks down a hacker in the 80’s. It’s a first person account, apparently real, and a fun read.
Cliff Stoll is hardly typical in any sense 🙂 And he sure didn’t use Python.
Ever heard of PDL? Quite a few astronomers use it. http://pdl.perl.org
I am a geophysicist and the same issues apply to some extent. A lot of new programming work in geophysics is being done in Python.
Unfortunately, I have found starting to work with Python extremely frustrating due to the problems that you mention in your postscript. Installing Pythonn packages is a nightmare, especially when packages require different versions of other packages before they will install. On the Mac, it is even worse because of the way the OS hides the package locations.
About the occasional need to use existing code in e.g. FORTRAN (my experience with physicist friends is that this is usually what they need (although I dispute that they _really_ need it)): One of the greatest aspects of Python is the easiness to create modules using such libraries. Tools such as SWIG or Cython make it very easy to pick up an existing library made in a lower-level language and turn it usable from within Python, with all its advantages.
Scientific programming in Python must evolve into detecting your algorithms’s bottlenecks, then implement these parts in C, or even assembly. Just like people used to do with C and assembly before… The difference is that Python makes it very easy for you to profile your code, for example, and also use parallel programming and data I/O.
Whatever you can’t make faster in your Python code just by using NumPy, or using multiple machines, should be solved by making small C procedures called from within Python modules.
nic’s reply makes a lot of sense. I’m currently studying astrophysics at university and we do actually get taught C programming in year 1 and 2, however not beyond that. I mentioned python to the guy in charge of programming in physics and he just shrugged and said that we should really be using FORTRAN! I think if you were an astrophysicist, you’d probably be modelling, which requires fast programs whereas if you were an astronomer analysing data, Python would be a good choice. I’m kinda hoping to merge the two, using C for the bottlenecks and python for everything else.
What OS are you using, all the Linux distros I know of package the python libraries?
OS X, plus some linux boxes where I don’t have root access.
There are quite a few ways to solve performance bottlenecks in Python without having to resort to C (or other verbose, low level languages). There are the python libraries that interface with exisiting C libraries. Cython will let you write python like code that gets compiled and runs fast. There is numpy of course. Some problems lend themselves well for distributed solutions, spreading the work over multiple machines. Image processing for example. In that case you can look in to Celery or pyzmq.
[…] 原文链接 […]
The Python scientific libraries are quite impressive and have allowed me to do a lot of amateur research into genetics, physics, and so forth. I would imagine that the successes I’ve had with Python and science so far could be easily replicated for those studying astronomy. Good post!
Have a look @ FORTH as a good alternative. It’s even on the Shuttles 🙂
You may find Software Carpentry (http://software-carpentry.org) useful — it’s a free online version of a crash course in software skills for scientists and engineers that has a lot of Python (and other things) in it.
Your site is a great resource, Greg, and deserves to be more widely used in the scientific community. Your research was a major influence on the ideas I outlined in point #1 above.
FWIW, I had to take a semester on C during my undergrad at University of Arizona (graduated in ’05). I’m not in science any more, but it was one of the most useful classes I took, providing the basics of programming.
[…] Exploring the Data Universe Skip to content HomeAboutContact ← Why Astronomers Should Program in Python […]