Python: Getting started with Python

In the most recent issue of the Bulletin of the American Meteorological Society (BAMS), Johnny Lin (Python guru with a very useful ) wrote a great explaining why Python is the 'new wave' in earth sciences computing. It would be fair to say that this is a widely held view, particularly considering that the AMS has hosted an annual Python symposium for the last 3 years running. While nice features like the elegant syntax and open-source nature of Python have certainly contributed to its popularity, the main reason why Johnny Lin and others are so bullish about Python is the fact that you can do pretty much everything with it.

Consider this hypothetical workflow, typical of the weather/climate sciences:

* Run a model (Fortran)

* Manage the output files, including simple post-processing (shell scripting, , )

* Analyse and visualise the data (IDL, Matlab, )

While all of the tools in this workflow (indicated in italics) are very useful and effective at performing their specific task, each is isolated from the other, meaning that communication between tools can only occur through files. When using Python on the other hand, every tool can be used in the same environment. This means that you can access any variable at any time, thus eliminating the need to read/write a new file at every step along the way. It also means that you'll only have one Python script to work with, rather than a complicated mess of shell scripts, compiled code, makefiles and IDL/Matlab/NCL scripts. Typically, there either exists a tool written in Python that you can 'import' into your environment to complete a specific task (see the list of common tools at Step 3 below), or you can create a 'wrapper' for code written in another language, so that you can execute it in your Python environment. Since Python runs much slower than compiled code, it is common to write a Fortran or C wrapper when speed is important. Wrappers are also a nice way of accessing existing code written in a different language, when you can't be bothered re-writing it in Python.

When I mention to people that , they commonly respond by saying that they use some other language (or combination of languages) at the moment, but intend to make the move to Python in the near future. Since the most difficult part of learning a new language is often figuring out where to start, I've put together a 4-step guide to getting started with Python:

STEP 1: LEARN THE BASICS OF PYTHON PROGRAMMING

The best beginners guide for people working in the weather/climate sciences is (in my opinion) the recent text from Johnny Lin (free download ). Since Johnny was an atmospheric scientist first, and programmer second, he has a great understanding of what people working in our field do (and don't) need to know about Python. He not only explains the language, but also guides you through the process of actually installing Python on your machine, which can be a non-trivial task. For other more advanced references, see my .

STEP 2: INSTALL UV-CDAT, IRIS AND CARTOPY

There are literally thousands of Python packages out there, written for all sorts of computing applications (if you don't believe me, check out the ). In order to create an effective working environment, you basically need to collect all the packages that are relevant to your work (e.g. a visualisation package, statistics package, netCDF input/output package, etc), install them in such a way that they all interact nicely, then write a whole heap of new functions/modules in order to do typical weather/climate science tasks with those packages. This process would be a time consuming nightmare, so it's fortunate that a couple of the major institutions in our field have done most of the work for us.

With respect to data analysis, the US Department of Energy has developed Climate Data Analysis Tools (). As Johnny Lin describes in the final chapter of his beginners guide, CDAT is a veritable Swiss Army knife - a collection of all the packages/modules you could ever need or want for weather/climate data processing (i.e. for dealing with netCDF files, performing common statistical tasks, etc). While CDAT has been around for over a decade, it has recently expanded to become Ultrascale Visualisation-Climate Data Analysis Tools (). In other words, now that they've produced a quality suite of data analysis tools, they've shifted focus towards improving their visualisation interface.

While the UV-CDAT interface is great for quickly viewing your data while you analyse it, it's probably not the tool you're going to use to produce highly customised, publication quality figures. For that you could use and , which are the general Python plotting packages that come with the UV-CDAT install, however there's a fair bit of wrestling involved in getting them to produce many of the types of figures common to the weather/climate sciences. Recognising this burden on scientists, the team of software engineers at the UK Met Office recently released Iris and Cartopy, which build on the general Python packages in order to produce a set of tools that are highly specific to the weather/climate sciences (see the SciTools and for details, plus this from SciPy 2013).

STEP 3: FAMILIARISE YOURSELF WITH THE KEY PACKAGES THAT COME WITH UV-CDAT

There's a daunting number of packages that come with UV-CDAT, so here's a list of the ones that you'll probably use most frequently, including a link to more detailed documentation:

* Interactive interpreter: One of Python's most useful features is its interactive interpreter, because you can simply test ideas at the command line instead of writing a test script. The interpreter supplied with the standard Python distribution is fairly limited, so has been developed as a much better alternative.

* Common processing tasks: There are packages to help you with data management (e.g. file I/O, variables, types, metadata, grids; ), simple data processing (e.g. spatial and temporal averages, custom seasons, climatologies; ), general statistics (e.g. correlation, linear regression; ), Empirical Orthogonal Function analysis () and common quantities calculated from the wind field, such as the streamfunction and velocity potential ().

* Linking with other languages: There are packages to help you interact with Fortran (), NCL (), GrADS () and CDO (), just to name a few. At the moment these packages are not included in the standard UV-CDAT install, so you'd have to install them yourself.

* Interactive data visualisation: As previously mentioned, when the original CDAT project grew to become UV-CDAT, the biggest advance was in the ability to view your data interactively. To see what I mean, check out these and video clip examples.

It's often easier to make sense of a new package by actually seeing it in action (i.e. by copying someone else's code!). Feel free to check out the source code at , as I frequently make use of the packages listed above.

STEP 4: FOLLOW THE PYAOS BLOG

By following the Python for the Atmospheric and Oceanic Sciences () blog, you can keep up to date with the latest Python developments relevant to our field.
Full Post

Python

Thursday, October 10, 2013

Getting started with Python

No comments:

Post a Comment