Skip to main content

Python scripting

Python is a versatile language for writing scripts, i.e. a series of commands that gets interpreted and executed on the fly. You can write Python scripts to make plots, read or write data files, reduce data, re-organise files, access resources on the network, send e-mail, etc. It is not astronomy-specific and has a large and growing user-base across many scientific disciplines.

Here are links to some common tasks and features as listed on this page:

  1. Basic Python script
  2. Numerical Python
  3. Reading FITS files in Python
  4. Plotting with pgplot in Python
  5. Reading ASCII files
  6. Periodograms
  7. Finding modules
  8. Finding out more about Python

A basic Python script

Here is a script to show some features of Python:

#!/usr/bin/env python

x = 2.
print x, 2*x, 1/x

# load mathematical functions
from math import *

# use a couple of them (natural log and exponential function)
print log(x), exp(x)

# create a Python list
al = [1.,2.,3.,4.,5.]

# loop through it
for num in al:
   print 'number =',num

# define a function
def doubleit(x):
	return 2*x

# use it
y = doubleit(x)

print'Twice',x,'=',y

This is mostly self-explanatory, except the first line which is a standard unix construct for the first line of scripts to say what interpreter in being used. If you create this script as a file such as 'script.py', then you can make it executable using 'chmod +x script.py' and then run it as './script.py'. Even easier the first time is to invoke the interactive python interpreter 'ipython' and type the above lines in. The only other notable line is the 'from math import *' which is one way to load an extra module. See also below. Note how in Python ':' and indentation are used where C and Perl uses braces {}.

Numerical Python

Python lists are highly flexible and can contain mixed types of data, e.g. [1,2.0,'a string'] which starts with an integer, then a float, then a string. However, they are inefficient when handling images where all one wants is a single data type which is handled fast. Therefore you are very likely to want to use the numerical module 'numpy' which has a relatively compact array class, and can handle vectorised computations. e.g.

#!/usr/bin/env python

import numpy as npy

# create a numpy array by conversion of a Python 'tuple' (between parentheses ())
arr = npy.array((1.,2.,3.,4.,5.,6.,7.,8.))

# Multiply all elements by 3
arr *= 3

# copy to another new array
narr = arr.copy()

# add first array to 1/new array
comb = arr + 1/narr

# create an array of complex numbers
real = npy.array((1.,1.,2.,3.))
imag = npy.array((-1.,2.,1.,2.))
i = complex(0.,1.)

cplx = real + i*imag

# take their exponential
ecplx = npy.exp(cplx)

Note that the above operations are carried out on all elements and especially for large arrays, can be much faster than doing so with loops inside Python. 'numpy' can handle data of arbitrary dimensions and has such features as computation of medians, averages, etc and much more besides. 'numpy' is installed here already. A particularly nice feature of numpy is shown in the next example (header code skipped for brevity, assume data is some numpy array representing an image, and that x and y are 1D arrays representing a spsectrum):

# limit data to 1000 maximum
data[data > 1000] = 1000.

# calculate mean spectrum value over the range 600 to 700 nm
print y[(x > 600) & (x < 700)].mean()

Statements like 'data > 1000' generate boolean True/False arrays which can be used as array indices. This is very powerful and often useful.

PyFits

PyFits is a module for reading and writing files in the widespread astronomical FITS data format. Here is an example of basic usage:

#!/usr/bin/env python

import pyfits

hdulist   = pyfits.open('image.fits')
data      = hdulist[0].data
fhead     = hdulist[0].header      
hdulist.close()

# subtract 1 from the data (it is a numpy array)
data -= 1

# print the data range
print 'Data ranges from',data.min(),'to',data.max()

Note that this script makes some assumptions about the format of the FITS file, but it can be easily adapted.

Plotting from within Python

More than one module is available for making plots. A popular choice is 'matplotlib'. I (TRM) use 'ppgplot', Python's implementation of the venerable pgplot plotting library. Here is an example of usage; again pgplot uses numpy arrays for speed.

#!/usr/bin/env python
#
# Script to plot a sinusoid

import numpy as npy
from ppgplot import *

# create a sinusoid running from -10 to +10
x = npy.arange(-10.,10.,0.1)
y = npy.sin(x)

# plot with pgplot
pgopen('/xs')
pgenv(x.min(), x.max(), y.min()-0.1,y.max()+0.1,0,0)
pgline(x, y)
pgclos()

Documentation of ppgplot is non-existent, but you can work out much from the standard pgplot documentation if you realise that you often do not need to specify the number of points in arguments since numpy arrays carry their own number inside them, unlike FORTRAN and C arrays.

Reading ASCII files

Often, input data is available in multi-column ASCII files. Numpy has a useful command called 'loadtxt' that reads such files and transfers column data into numpy arrays. The following line will generate two numpy arrays x,y containing the first two columns of the file, ignoring any lines that start with # and use space as the column delimiter:

x,y = numpy.loadtxt(file, unpack=True, comments='#',delimiter=' ')

Another example to select specific columns:

x,y,z = numpy.loadtxt(file, unpack=True, usecols=(0,2,3))

There are other modules available that can handle ASCII files such as CSV, and you can easily write your own custom reader using generic python file I/O that read files on a line by line basis (as a string):

for line in open(filename, 'r'):
	fields = line.strip().split(' ')
	...

Periodograms

If you point to TRM's python routines (see below) then the following script shows how to compute a periodogram:

#!/usr/bin/env python

import numpy as npy
from ppgplot import *
import trm.subs as subs

# create an x, y and uncertainty arrays; normally these
# would be from some sort of datafile
x = npy.arange(0.,10.,0.01)
y = npy.sin(50.*x)
e = npy.ones_like(y)

# compute periodogram
(f,p) = subs.fasper(x,y,e,1000,20.,pgram=False)

# plot the result
pgopen('/xs')
pgenv(f.min(), f.max(), 0., 1.1*p.max(),0,0)
pgline(f,p)
pgclos()

'trm.subs' contains the function 'fasper' for computing Lomb-Scargle periodograms using the FFT-based method of Press and Rybicki. See 'pydoc trm.subs.fasper' for more details.

Finding modules

In the examples above, 'math', 'pyfits' and 'numpy' are all modules in addition to those provided in 'core' Python. There are a huge number of such modules and for some new task it is worth finding out whether a suitable module exists. Examples are 'smtplib' for sending e-mail via SMTP servers and 'imaplib' to communicate with IMAP servers. The best way to find them is by googling, but the other problem is when you invoke a script, how does python know where to look for the installed modules? The answer is, it looks through a series of directories specified via an enviroment variable called PYTHONPATH. In tcsh for instance, the following line

setenv PYTHONPATH  /home/astro/phsaap/software/lib64/python2.5/site-packages:/usr/lib/python2.5/site-packages

directs PYTHON to look through my (TRM) python directories, where pyfits resides, as well as the system ones, where 'numpy' and 'scipy' (another very useful module) sit. If you end up writing your own Python modules, you can easily add to this.

Finding out more about Python

There are many on-line resources and these are recommended in preference to books as they will always be more up-to-date. Here are some links to these

  1. Main Python site. Loads of links from here.
  2. Python tutorial. A good place to start learning the language.
  3. Scientific Python including numpy tutorial and documentation and lots of useful modules.
  4. AstroPython hosts links to resources and tutorials relevant for astronomy
  5. AstroPy is work in progress and aims to consolidate popular python packages
  6. Python 4 Kids yes, that's you ...

The facility 'pydoc' is useful too as in 'pydoc numpy' or 'pydoc numpy.array'.

You should minimally aim to master: lists, tuples, and dictionaries (three basic data structures), strings, how to open, read and write files, the standard statements such as 'for', 'while, 'if' and how to define functions (very easy). As you get more advanced you may want to learn how to define your own object types or 'classes', and if you are carrying out fancy text operations (e.g. re-formatting tables, extracting information from program output), you may want to look at the 're' regular expression module. Finally the 'os' and 'sys' modules provide basic interaction with the system.