Python notes 4/23/07


Intro

Python is an interpreted/compiled programming and analysis language that, at first glance, looks like it could be a good (or much better) replacement for commercial packages such as IDL or Matlab. Indeed, many people in the space sciences community are now using Python in just this way, with acceptable results. Python programs can be written, compiled, and run from file-based modules similar to C, Java, and other languages, or Python commands can be entered manually in an interactive shell, similar to the C shell (or IDL). Python programs can be faceless file-in/process/file-out systems, or can involve user interaction via windows, graphics, and other GUI elements.

There is currently a large, and growing, community of Python programmers and scientists, and there is a plethora of free technical code available. Doing science with Python appears to be a good way to contribute to free scientific programming, rather than catering to (and putting money in the pockets of) companies like Research Systems and Mathworks. Furthermore, as Python is older than many commercial packages, these programs appear to have borrowed considerably from Python syntax, so that in many ways Python programming is similar to IDL or Matlab programming, whereas the causality is the other way around. However, there are some packages (e.g. matplotlib) which have been specifically designed to use syntax similar to familiar commercial products.

Good & bad

Probably the best thing about Python is that it is part of the Unix/C heritage. Probably the worst thing about Python is that it is part of the Unix/C heritage. If you are familiar with Unix systems and C programming, then you should know what I mean. Python has a distinctly Unix feel, which is appealing to some people (like myself), but which is uncomfortable to others. It is quite clear that Python evolved as a Unix system tool, kind of like a super-C-shell scripting system. Possibly the best way to describe Python's 'flavor' --and to squarely introduce some 'political' issues-- is to say that it is like Perl for smart people. Python is distinctly unlike Java, which is a self-contained universe by itself, although it shares many of the same object oriented concepts and mechanisms. Python is so intimately connected to the Unix/C world that I would not expect to find it running on, say, a VMS system (it does), without being seriously crippled. I am kind of surprised that it runs under Windows, but I understand that the OS interface is somewhat different, and that often an additional Unix-like program is run beneath Python.

Python is a very complicated language, but one in which simple things can be done fairly quickly. It is probably the fullest, most complex (does that mean 'best'?), object oriented language I have seen, and is more similar to common Lisp than it is to C++ or Java (or even Smalltalk or Eiffel). One could spend the rest of one's life fully learning Python and doing only very esoteric, ivory tower, 'pure computing' things with it. C is a much more 'down to Earth', 'nuts and bolts' language. However, that isn't really a problem, for reasons discussed below. To begin learning Python I recommend the following:

Both of the books cover up to Python version 2.5. The first book is about the size of MTW's Gravitation, but it covers lots of stuff, gets started quickly, and has lots of example programs in the book and which can be downloaded from the website. Unfortunately, learning to program Python isn't like learning Basic or even C. It's definitely a big time investment, but one that could last a long time as well. Python's been around for 20 years and gives every impression of being around for another 20 years (or evolving into something similar), so it's unlikely to be screwed up by Microsoft any time soon. Rather than learing how to to everything from scratch in Python, many people concentrate on learning how to run one of the many free scientific packages for Python (discussed below). That can be both a good and a bad thing. I'm very much a do-it-yourselfer, and I recommend that anyone serious about scientific computing with Python (or any other language) learn to be one as well.

C based

Python, and most Python extensions, are written in C (although there is also some Fortran code). When I'm running Python I'm very much aware that I'm running a program written in C (which is fine with me). Programs written in Java (e.g.) have a very different 'feel' to them (although this is a very subjective sensation. For example, running IDL programs just feels 'bad' to me, in a spiritual way). Python programs can be written in Python or in C or another compiled language (with a Python interface). Ultimately, everything that happens on a specific platform happens in compiled code, especially graphics and UI. Therefore, many Python libraries contain both Python and C code, and installing a new library can involve running the C compiler for 20 minutes or so. But this also means that anything that Python can't currently do can be coded up in a C module and compiled with a Python interface. So, you get the best of both worlds, as well as the worst of both worlds. However, for someone like myself who has been a professional Unix/C programmer for the past 20 years, this is a good thing. It means that while I can expect to spend most of my time writing and running Python code, I am not at all hesitant to switch to C when there's something new I need. But if C, compilers, linkers, and make files, etc... all seem like a 'black box' to you, then you may at times find yourself curtailed and frustrated by the limitations of the Python packages currently available to you, especially for code which interacts with the underlying hardware or OS.

Installation

Mac OS X comes pre-installed with Python 2.3.5, so you should be able to start up a shell (Terminal window) and type 'python' and get something like the following:

1: ~ > python
Python 2.3.5 (#1, Jan 13 2006, 20:13:11) 
[GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 

for a Python 'Easter egg', type the following and hit return:

>>> import this

I won't tell you what this does, as it should be an exercise for the reader. Cognoscenti should detect the (somewhat snide) reference to Perl, and another to Python's creator. Here's another funny from the end of the Programming Python book and The Empire Strikes Back. To quit out of Python, type control-D at the triple prompt.

Unfortunately, version 2.3.5 isn't the one you want for scientific computing, and the Mac installation comes with no additional packages, so the next thing you have to do is download and install some software. And then download and install some software. And then download, compile, and install some more software. And then... Getting a usable scientific Python system is not a trivial task. I've been working on this for about a week now, and I'm not done yet, although most of the things I want to do finally seem to work. As this is Unix software, the {download, compile, install} pocess is fairly straightforward, although you do have to do things in specific order --which only becomes apparent when something you spent an hour downloading and installing doesn't work because you didn't download and install something else first, or you downloaded and installed some things out of order. Also, it sometimes seems that to install a peice of Unix software requires you to first install two other peices of software, which requires you to first etc etc...

I seem to have a fairly stable system at present, so I hope that my past week's experiences can help you to get up and running a bit faster than I did. However, what would be really great would be if someone took all the separate packages, libraries, external programs, etc..., and packed them up as a single static stand-alone application that worked right out of the box. Until then, here's my best guess at what you will need to do to install a nominal Python scientific computing environment. I've also created CDs of all the various software and manuals I've acquired, and of the final Python 'site-packages' and /usr/local directories that I created, which should at least save you some time looking for things on the web.

(NOTE: You may want to skip a bit here and scroll down the page to look at the examples and screenshots first.)

  1. Start at the SciPy page. Click on 'Download NumPy/SciPy' near the bottom. In the SciPy section of the next page, under 'Binary installation for OS X (Tiger)', command-click on the 'MacPython 2.4' link (to open a new window and save the current one).

  2. Get MacPython 2.4.4. At the 'pythonmac.org/packages/' page, click 'Packages for Universal Python 2.4.x, Mac OS X 10.3.9 and later'. On the next page, click 'python-2.4.4-macosx2006-10-18.dmg'. I also have the disk image on CD. Go through the download and installation process in the dialogs. This should put links to things in /usr/local, although the actual running code is elsewhere (find python2.4 from the Finder). Once done, you should be able to use a shell to get Python 2.4.4:

    3: ~ > python2.4
    Python 2.4.4 (#1, Oct 18 2006, 10:34:39) 
    [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 
    

    Entering just 'python' will probably still get you Python 2.3.5. To fix this, cd to /usr/bin and list all the pythons:

    4: ~ > cd /usr/bin
    5: /usr/bin > ls -al python*
    lrwxr-xr-x   1 root  wheel     24 Apr 17 10:24 python -> python2.3
    lrwxr-xr-x   1 root  wheel     72 Feb 19  2006 python2.3 -> ../../System/Library/Frameworks/Python.framework/Versions/2.3/bin/python
    lrwxr-xr-x   1 root  wheel     25 Apr 18 15:13 pythonw -> pythonw2.3
    -rwxr-xr-x   1 root  wheel  34216 Jan 13  2006 pythonw2.3
    

    You need to replace the python and pythonw links with ones to python2.4. To do this, you will probably have to be Super User. If so, type 'su' and the root password. Then remove the python and pythonw links with 'rm python' and 'rm pythonw' and make then point to python2.4 with 'ln -s /usr/local/bin/python2.4 python' and 'ln -s /usr/local/bin/pythonw2.4 pythonw'. If all went well, your files should now look like:

    lrwxr-xr-x   1 root  wheel     24 Apr 17 10:24 python -> /usr/local/bin/python2.4
    lrwxr-xr-x   1 root  wheel     72 Feb 19  2006 python2.3 -> ../../System/Library/Frameworks/Python.framework/Versions/2.3/bin/python
    lrwxr-xr-x   1 root  wheel     25 Apr 18 15:13 pythonw -> /usr/local/bin/pythonw2.4
    -rwxr-xr-x   1 root  wheel  34216 Jan 13  2006 pythonw2.3
    

    and you can then exit root by pressing control-D. Now saying 'python' in a shell (and in a '#!/usr/bin/python' or '#!/usr/bin/env python' line in a program file) should get you Python 2.4.4.

  3. Get the SciPy Superpack. Meanwhile, back at the 'Download NumPy/SciPy' page, right under the MacPython 2.4 link, get either the PPC or Intel 'SciPy Superpack for Python 2.4'. I have the one for Intel on CD.

    NOTE: All of the remaining directions below are for Intel Macs running 10.4.9. Presumably you do similar things for PPC Macs. However, all of the chip-specific software on my CDs is for Intel. PPC users will need to find the corresponding websites to download from.

    Open the package and install the software from the dialogs. This should put the 'numpy', 'matplotlib', and 'scipy' packages into /Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/, which is where new code accessible from Python 2.4.4 should go. This is fine if you only want to use ndarrays (from numpy) with matplotlib. However, if you want to use numdisplay and DS9 to display FITS images, you will also need the numarray and numdisplay packages and will then have to recompile all the matplotlib code (see below). I did this (eventually getting it right) and recommend it, but it's probably not required. Another present you get with this package is a copy of gFortran (in /usr/local/bin).

  4. Get PyFITS. Go to the PyFITS page and click on Download. Get version 1.1rc1, which works with both numarrays and ndarrays (numpy). PyFITS is installed like most Python software. cd to the directory containing the setup.py file, and type 'python setup.py install' into a shell. This should place the resulting code into the Python 2.4 site-packages directory. The default array type for PyFITS is ndarray. To use the old numarrays, you must have numarray installed (see below) and then set a NUMERIX environment variable (with 'set NUMERIX numarray' in csh, something else in sh) before launching Python.

    NOTE: These 4 steps may be all you need to do to get started with Python. They include the pylab, scipy, matplotlib, pyfits, and numpy packages. See the respective manuals and tutorials (on the CD or from the websites) to see if this is enough for you. Most likely, however, you will need additional things described in the steps below.

    And now for something completely different...

  5. Get several C libraries. In order to compile matplotlib, the Python Image Library (PIL), and other packages and programs, you should have the following libraries in /usr/local/lib:
    libfreetype:  Freetype fonts system                        http://www.freetype.org/
    libjpeg:      JPEG image support                           http://www.ijg.org/
    libpng:       PNG image support (also in /usr/lib/libz)    http://www.libpng.org/pub/png/
    

    In general, for astronomical computing, you should also have the following libraries:

    libcfitsio:   Read/write FITS files                        http://heasarc.gsfc.nasa.gov/docs/software/fitsio/
    libfftw3:     Fourier transforms                           http://www.fftw.org/
    

    Get these libraries as .tar.gz source files. They (and most Unix software) are all built in similar ways:

    (cd to directory with configure script)
    ./configure
    make
    make install (as root)
    

    Usually these libraries are installed in /usr/local/lib, with headers in /usr/local/include, and any programs in /usr/local/bin. Some configuration scripts may require a location prefix flag: e.g. './configure --prefix=/usr/local'. See the README or INSTALL files in each module for specific instructions. If the libraries and headers don't appear in /usr/local as desired, you may have to copy them manually from wherever they were built. Usually, you need to be root to copy to /usr/local. You may also need to run ranlib on a moved library to update its timestamp, if the linker complains.

  6. Get numarray. Lots of software still requires this older array format (e.g. numdisplay), so you may as well have it. All new software should be written to use ndarrays or matrixs [sic] (numpy). Get numarray here. Build with 'python setup.py config install --gencode', or see the INSTALL file for details.

  7. Get numdisplay. This is used to send numarrays (e.g. FITS files) from Python to DS9 for display. Here is the page. Build with 'python setup.py install'.

  8. Get X11. This should already be on your Mac, or get it off of your installer disk in the options package. Get the 2006 update here.

  9. Get SAOImageDS9. This is an X-Windows program used to display images. Here is the page. Get the Mac OS X Intel or PPC binary.

  10. Get matplotlib. Although the SciPy .dmg above comes with matplotlib, it is compiled only for ndarrays. If you want to use numarrays as well (e.g. to display FITS files with numdisplay and DS9), you will need to rebuild matplotlib from source. The page is here. Although there is lots of C and C++ code, it is built by the Python distutils system: 'python setup.py install'. The C compiler will run for quite a while.

    NOTE: To use pylab/matplotlib displays (e.g. plot(), imshow()) you need to create a .matplotlib/matplotlibrc file in your home directory. There is an example at the website or in the code directory. You need to set a couple of flags in this file to get output:
    backend      : TkAgg
    numerix      : numpy  # numpy, Numeric or numarray
    interactive  : True      # see http://matplotlib.sourceforge.net/interactive.html
    

    This sets the GUI to use Tk, arrays as ndarrays, and interactive mode active. This assumes that you have compiled matplotlib with the default settings in setup.py:

    BUILD_AGG          = 1
    ...
    BUILD_TKAGG        = 'auto'
    

    matplotlib can also be built to use gtk or wx widgets, if you have those installed. Python comes with Tk.

  11. Get PIL. This is the Python Image Library, and allows code to read and manipulate JPEGs, PNGs, and potentially other image types (e.g. FITS) with custom code. Build with 'python setup.py install'.

  12. Get IPython. This is an interactive enhancement to the Python shell. It is not required, but some scientific packages (e.g. matplotlib, pylab, scipy) can make use of it. Here is the page. IPython is a script, rather than a binary program. Therefore you start it with Python like any other script. I put an alias in my .tcshrc file: 'alias ipython python ~/Projects/Python/ipython-0.8.0/ipython.py', so I can just type 'ipython'. Replace my path with the one for your copy of ipython.py.

  13. Get some tutorials/manuals. Some good ones are:

    Matplotlib tutorial
    Matplotlib user's guide
    PyFITS user's manual
    Using Python for interactive data analysis
    SciPy course outline
    NumPy documentation (costs $40)

Using FITS files

Here's a quick demo of how you can use FITS files with Python. In this case, I want to display them using a variety of methods, so I will use the older numarray array format. To do this, I first set a flag in my .matplotlib/matplotlibrc file:

numerix      : numarray  # numpy, Numeric or numarray

I then set a shell environment variable for PyFITS:

> setenv NUMERIX numarray

I then launch SAOImageDS9 under X-Windows, and run the following Python commands:

> python
Python 2.4.4 (#1, Oct 18 2006, 10:34:39) 
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyfits
>>> from numarray import *
>>> import numdisplay
>>> im = pyfits.getdata('c1230.en.map.fits')
>>> im.__class__
<class 'numarray.numarraycore.NumArray'>
>>> numdisplay.display(im)
Image displayed with Z1:  0.0  Z2: 1000.0

The FITS file is imported as a NumArray object, and is sent to DS9 for display. I can also display the file in a matplotlib/pylab Tk window:

>>> from pylab import *
>>> imshow(im)

I can also display the file in a Cocoa window (although this is currently a C program launched from the shell):

>>> import os
>>> os.system('~/Projects/FITS/XC/plotfits c1230.en.map.fits')

To use the Cocoa display as a Python function, I would have to package the C program as a Python extension. However, Cocoa windows and graphics offer significant advantages over either Tk or X-Windows, but can only be run under OS X.

To summarize, there are at least 3 ways to display images from Python:

  1. In a Tk (or other widget) window launched from the Python program.
  2. In an X-Windows program running concurrently with Python.
  3. In a Cocoa (Mac-native) window launched as a Python extension.
The level of control, in terms of GUI, increases from 1-3 above, with Cocoa being the best, but also the least portable.

FITS arrays can also be manipulated and plotted in other Tk windows. Some array syntax will be demonstrated below, but in general it is similar to either IDL or Matlab, although more powerful than either:

>>> im.shape  
(700, 700)
>>> plot(im[:, 350])

>>> plot(im)

NumPy arrays

Here's a very brief demo of NumPy array objects. Start Python, import numpy and create some ranges:

> python
Python 2.4.4 (#1, Oct 18 2006, 10:34:39) 
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from numpy import *
>>> a = range(5); a
[0, 1, 2, 3, 4]
>>> a = [range(5), range(5, 10), range(10, 15)]; a
[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [10, 11, 12, 13, 14]]

Create an array out of ranges, get its shape, then create a new real array from a shaped range:

>>> a = array([range(5), range(5, 10), range(10, 15)]); a
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
>>> a.__class__
<type 'numpy.ndarray'>
>>> a.ndim, a.shape
(2, (3, 5))
>>> a = arange(15).reshape(3, 5) * 1.0; a
array([[  0.,   1.,   2.,   3.,   4.],
       [  5.,   6.,   7.,   8.,   9.],
       [ 10.,  11.,  12.,  13.,  14.]])

Reset the value of a row and column:

>>> a[:, 2] = -a[:, 2]; a
array([[  0.,   1.,  -2.,   3.,   4.],
       [  5.,   6.,  -7.,   8.,   9.],
       [ 10.,  11., -12.,  13.,  14.]])
>>> a[1, :] = -a[1, :]; a
array([[  0.,   1.,  -2.,   3.,   4.],
       [ -5.,  -6.,   7.,  -8.,  -9.],
       [ 10.,  11., -12.,  13.,  14.]])

Try to take the real sqrt(), and then the complex sqrt():

>>> sqrt(a)
Warning: invalid value encountered in sqrt
Warning: invalid value encountered in reduce
Warning: invalid value encountered in reduce
array([[ 0.        ,  1.        ,         nan,  1.73205081,  2.        ],
       [        nan,         nan,  2.64575131,         nan,         nan],
       [ 3.16227766,  3.31662479,         nan,  3.60555128,  3.74165739]])
>>> sqrt(a + 0j)
array([[ 0.        +0.j        ,  1.        +0.j        ,
         0.        +1.41421356j,  1.73205081+0.j        ,  2.        +0.j        ],
       [ 0.        +2.23606798j,  0.        +2.44948974j,
         2.64575131+0.j        ,  0.        +2.82842712j,  0.        +3.j        ],
       [ 3.16227766+0.j        ,  3.31662479+0.j        ,
         0.        +3.46410162j,  3.60555128+0.j        ,  3.74165739+0.j        ]])

Get some stats:

>>> print a.min(), a.max(), a.mean(), a.sum(), a.std()
-12.0 14.0 1.4 21.0 8.1059648819

Find some values satisfying conditions:

>>> where(a > 0, 1, 0)
array([[0, 1, 0, 1, 1],
       [0, 0, 1, 0, 0],
       [1, 1, 0, 1, 1]])
>>> a > 0
array([[False,  True, False,  True,  True],
       [False, False,  True, False, False],
       [ True,  True, False,  True,  True]], dtype=bool)
>>> a[a > 0]
array([  1.,   3.,   4.,   7.,  10.,  11.,  13.,  14.])

Can do the same on a 1d version of arrays:

>>> a.ravel()
array([  0.,   1.,  -2.,   3.,   4.,  -5.,  -6.,   7.,  -8.,  -9.,  10.,
        11., -12.,  13.,  14.])
>>> a.ravel() > 0
array([False,  True, False,  True,  True, False, False,  True, False,
       False,  True,  True, False,  True,  True], dtype=bool)
>>> (a.ravel() > 0).nonzero()
(array([ 1,  3,  4,  7, 10, 11, 13, 14]),)
>>> a.ravel()[(a.ravel() > 0).nonzero()]
array([  1.,   3.,   4.,   7.,  10.,  11.,  13.,  14.])
>>> a.ravel()[(a.ravel() > 0).nonzero()] = 999; a
array([[   0.,  999.,   -2.,  999.,  999.],
       [  -5.,   -6.,  999.,   -8.,   -9.],
       [ 999.,  999.,  -12.,  999.,  999.]])

Compute the FFT of an array:

>>> from numpy.fft import *
>>> a = 1.0 * array([1, -1] * 4 * 8).reshape(8, 8); a
array([[ 1., -1.,  1., -1.,  1., -1.,  1., -1.],
       [ 1., -1.,  1., -1.,  1., -1.,  1., -1.],
       [ 1., -1.,  1., -1.,  1., -1.,  1., -1.],
       [ 1., -1.,  1., -1.,  1., -1.,  1., -1.],
       [ 1., -1.,  1., -1.,  1., -1.,  1., -1.],
       [ 1., -1.,  1., -1.,  1., -1.,  1., -1.],
       [ 1., -1.,  1., -1.,  1., -1.,  1., -1.],
       [ 1., -1.,  1., -1.,  1., -1.,  1., -1.]])
>>> print abs(fft(a))
[[ 0.  0.  0.  0.  8.  0.  0.  0.]
 [ 0.  0.  0.  0.  8.  0.  0.  0.]
 [ 0.  0.  0.  0.  8.  0.  0.  0.]
 [ 0.  0.  0.  0.  8.  0.  0.  0.]
 [ 0.  0.  0.  0.  8.  0.  0.  0.]
 [ 0.  0.  0.  0.  8.  0.  0.  0.]
 [ 0.  0.  0.  0.  8.  0.  0.  0.]
 [ 0.  0.  0.  0.  8.  0.  0.  0.]]
>>> print abs(fftshift(fft(a)))
[[ 8.  0.  0.  0.  0.  0.  0.  0.]
 [ 8.  0.  0.  0.  0.  0.  0.  0.]
 [ 8.  0.  0.  0.  0.  0.  0.  0.]
 [ 8.  0.  0.  0.  0.  0.  0.  0.]
 [ 8.  0.  0.  0.  0.  0.  0.  0.]
 [ 8.  0.  0.  0.  0.  0.  0.  0.]
 [ 8.  0.  0.  0.  0.  0.  0.  0.]
 [ 8.  0.  0.  0.  0.  0.  0.  0.]]

The NumPy array object and packages contain dozens of additional functions for performing statistics, array manipulations, convolutions, polynomials, etc...

Matrices

NumPy also includes a matrix type, which is an NxM array. Matrices have additional properties and methods, and overload the standard operators such as multiplication.

>>> from numpy.random import *
>>> a = matrix(rand(4, 4)); a
matrix([[ 0.81634207,  0.1337322 ,  0.14083475,  0.21813705],
        [ 0.56612496,  0.8580282 ,  0.30047178,  0.3911366 ],
        [ 0.88954812,  0.02283352,  0.86269536,  0.05555847],
        [ 0.19919564,  0.0890195 ,  0.53704153,  0.32616056]])
>>> a.I
matrix([[ 1.17141069, -0.11853493,  0.27883583, -0.68879087],
        [-1.09956122,  1.43630936,  0.32853997, -1.04301787],
        [-1.28867443,  0.11722304,  0.98403588,  0.55367178],
        [ 1.70656493, -0.51263568, -1.88023184,  2.85966003]])
>>> print a.I * a
[[  1.00000000e+00   2.77555756e-17   1.11022302e-16   5.55111512e-17]
 [  1.11022302e-16   1.00000000e+00   1.11022302e-16   1.11022302e-16]
 [ -1.52655666e-16   0.00000000e+00   1.00000000e+00   0.00000000e+00]
 [ -1.11022302e-16  -5.55111512e-17  -4.44089210e-16   1.00000000e+00]]

Speed tests

I haven't written many Python programs yet, so I don't have a case-by-case comparison with C. However, I did compare a C matrix inversion/multiplication program to NumPy's matrix inversion/multiplication. The C program I used was one I wrote previously to test the relative speeds of different Mac systems. It is not meant to be the fastest possible program, just one that can be tested on different systems to give comparable results. It generates a random NxN matrix, inverts it using Gauss-Jordan elimination with full pivoting, and then multiplies the matrix by its inverse (both ways) to check the result. I have no idea how NumPy implements matrix inversion. My C program running on 1000x1000 double-precision matrices takes about 45 seconds to complete:

5: ~/Projects/Gauss-Jordan > date ; gj686 1000 0 ; date
Sun Apr 22 17:13:00 MDT 2007
Sun Apr 22 17:13:45 MDT 2007

An equivalent Python function might be:

>>> def foo(n):
...     os.system('date')
...     a = matrix(rand(n, n))
...     ai = a.I
...     b = a * ai
...     c = ai * a
...     os.system('date')
...     return c 
... 
>>> c = foo(1000)
Sun Apr 22 17:14:00 MDT 2007
Sun Apr 22 17:14:06 MDT 2007
>>> c     
matrix([[  1.00000000e+00,  -7.49400542e-14,  -1.73194792e-14, ...,
          -1.36366612e-13,  -9.56942858e-14,  -5.10216869e-14],
        [  1.03028697e-13,   1.00000000e+00,   1.77635684e-15, ...,
           5.63334102e-14,   2.99066327e-14,   8.91994811e-14],
        [  3.21964677e-15,   3.87849475e-14,   1.00000000e+00, ...,
           1.02633180e-13,   6.93282237e-14,   6.65396557e-14],
        ..., 
        [  2.07056594e-14,   1.44328993e-15,   1.35724765e-14, ...,
           1.00000000e+00,   9.30366895e-14,  -4.82253126e-15],
        [  6.59194921e-15,  -2.49800181e-15,   2.21697660e-15, ...,
          -1.39905448e-15,   1.00000000e+00,   2.49800181e-16],
        [ -3.49997809e-14,  -2.10942375e-14,  -3.20229954e-14, ...,
          -8.14452672e-14,  -7.89646126e-14,   1.00000000e+00]])
>>> print c.dtype, c.shape
float64 (1000, 1000)
>>> c = foo(2000)
Sun Apr 22 17:26:00 MDT 2007
Sun Apr 22 17:26:44 MDT 2007
>>> c
matrix([[  1.00000000e+00,  -1.51732793e-13,  -5.23192600e-14, ...,
          -6.43009951e-14,  -2.83165852e-13,  -2.89716168e-14],
        [ -2.05835349e-13,   1.00000000e+00,  -3.13832293e-13, ...,
          -2.08277839e-13,  -9.95731275e-14,   2.25167107e-15],
        [  3.22519789e-14,   1.42719170e-13,   1.00000000e+00, ...,
           2.22835639e-13,  -4.53699578e-14,  -1.32887625e-13],
        ..., 
        [ -1.85962357e-15,   1.60288449e-15,   7.93462518e-15, ...,
           1.00000000e+00,   2.08166817e-16,  -3.89271948e-15],
        [  1.57235336e-14,   1.14075416e-14,   1.83004653e-14, ...,
          -2.17187379e-15,   1.00000000e+00,   9.55659163e-15],
        [  3.62210262e-15,  -1.08940634e-15,  -7.79584730e-15, ...,
           1.76733628e-14,   1.25663369e-14,   1.00000000e+00]])
>>> print c.dtype, c.shape
float64 (2000, 2000)

Of course this isn't a direct comparison of Python's actual speed, since the inversion is certainly being done in compiled C code, and probably uses a method better than GJ. But, as a first pass, it suggests that Python is probably 'fast enough' for calculations. However, as additional information, consider the following:

>>> def foo(n):
...     os.system('date')
...     x = fft(rand(n, n))
...     os.system('date')
...     return x
... 
>>> foo(1000)
Sun Apr 22 17:34:31 MDT 2007
Sun Apr 22 17:34:31 MDT 2007
array([[  4.96001627e+02 +0.j        ,  -5.96842134e+00 +3.5645091j ,
         -1.26021109e+00 -9.10648446j, ...,   1.32986754e-01 +0.98455753j,
         -1.26021109e+00 +9.10648446j,  -5.96842134e+00 -3.5645091j ],
       [  5.12303163e+02 +0.j        ,   9.62648276e+00 +2.0764284j ,
         -1.38441143e+01 -6.20285508j, ...,  -1.51497810e+00 +7.30754387j,
         -1.38441143e+01 +6.20285508j,   9.62648276e+00 -2.0764284j ],
       [  4.97417526e+02 +0.j        ,  -9.06651415e-02 +8.20669437j,
          5.74951458e+00 -4.5737919j , ...,  -1.54866395e+01 -2.73620996j,
          5.74951458e+00 +4.5737919j ,  -9.06651415e-02 -8.20669437j],
       ..., 
       [  5.13899215e+02 +0.j        ,   1.53185048e+00 +1.18847739j,
         -5.24489950e+00 -0.7996017j , ...,   1.29652505e+00 +1.90884849j,
         -5.24489950e+00 +0.7996017j ,   1.53185048e+00 -1.18847739j],
       [  5.03255483e+02 +0.j        ,   4.98178367e+00 +0.33379411j,
         -4.45263312e+00 -0.24665999j, ...,  -6.23934729e+00 -6.99924003j,
         -4.45263312e+00 +0.24665999j,   4.98178367e+00 -0.33379411j],
       [  4.98274067e+02 +0.j        ,   3.75069495e+00+16.3833213j ,
         -6.77387368e+00 -9.41446747j, ...,  -8.00904375e+00 -1.64761378j,
         -6.77387368e+00 +9.41446747j,   3.75069495e+00-16.3833213j ]])
>>> y = foo(1000)
Sun Apr 22 17:34:40 MDT 2007
Sun Apr 22 17:34:41 MDT 2007
>>> y = foo(5000)
Sun Apr 22 17:34:50 MDT 2007
Sun Apr 22 17:35:02 MDT 2007
>>> y = foo(10000)
Sun Apr 22 17:35:11 MDT 2007
Python(340) malloc: *** vm_allocate(size=1600000000) failed (error code=3)
Python(340) malloc: *** error: can't allocate region
Python(340) malloc: *** set a breakpoint in szone_error to debug
Traceback (most recent call last):
  File "", line 1, in ?
  File "", line 3, in foo
  File "/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/numpy/fft/fftpack.py", line 89, in fft
    return _raw_fft(a, n, axis, fftpack.cffti, fftpack.cfftf, _fft_cache)
  File "/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/numpy/fft/fftpack.py", line 64, in _raw_fft
    r = work_function(a, wsave)
MemoryError

GUIs

As a default, Python makes use of the Tk window/widget set, as implemented for OS X using the Aqua look-and-feel. As expected, while this set provides basic functionality for GUIs consisting of windows, menus, buttons, sliders, dialogs, alerts, drawing, images, and hooks to the underlying OS, it doesn't look or work as well as a native GUI done with Interface Builder. There are several other window/widget sets available for Python which may be better looking/working, and it is also possible to use the native GUI through a set of Cocoa hooks (although this makes the UI Mac-only). It is also possible for Python to communicate with X-Windows GUI programs and use them for a front end, but they don't really look or work better than the Tk, etc..., widgets.

Python/Tk also appears to be able to handle the complexity necessary to create something like Scientist's Workbench:

Complete programs

While it appears possible to create a Python program with a usable GUI, it is still a question whether it is possible, or easy, to create a complete interactive program having the complexity of FITSRegister or FITSFlow.

It seems clear that Python is capable of performing the calculations of either FITSRegister or FITSFlow, but it is not clear if Tk or another Python window/widget set is good enough to create the kind of GUI that can be made with Interface Builder and the Cocoa Application Kit classes, either from Objective C or from Python. It might be worthwhile to spend a little time investigating this question, as it should be fairly quickly obvious whether there is much chance of this or not. And if it is not possible with Tk, perhaps one of the other window/widget sets (e.g. wx, gtk) might be more appropriate. One thing I can say at present, however, is that I would not want to try to create a program like FITSRegister or FITSFlow (or even DS9) using only the X11 library. I also have some doubts that a good GUI can be created entirely programmatically --relying on geometry managers, packers, and the like--, rather than via an interactive (and manual) layout program such as IB, although Java and Swing seem to be able to produce reasonable results (on the Mac at least). I suppose that is something else to consider, whether we might be better off to move to Java, rather than Python, especially if interplatform operability is a goal.

Conclusion

The question we must now ask ourselves is whether it is worthwhile to continue to work with Python, with the expectation that it will eventually yield beneficial results, or whether we should stay with Objective C and Cocoa --but be tied to the Mac--, or whether we should investigate another, potentially cross-platform, development system like Java. The advantages to using Python seem to be:

  1. It appears to be a good overall language, with a particularly good object oriented system.
  2. It is Unix/C based.
  3. There are many other astronomers using Python, and there is a considerable amount of free code available.
  4. Python is very similar to IDL, Matlab, and other non-free systems.
  5. Python can be extended with C.
  6. Currently existing Python libraries make use of fast C code (e.g. for FTTs and matrix inversion).
The disadvantages to using Python seem to be:

  1. It involves learning a new language that is considerably different from, and more complicated than, C, Fortran, Java, etc...
  2. It is Unix/C based.
  3. It is somewhat involved to install and set up.
  4. It's GUI capabilities may leave much to be desired, especially for complicated programs.
  5. While programs can be written in Python or C, it may require C and a complicated programming interface to do anything efficiently.
  6. Current libraries and packages, especially those written in C, may be 'black boxes' and have limited flexibility to end users.
For myself, the short answer to this question is yes. I am familiar with Unix and C, I like Unix and C, and I like what I have seen of Python so far. In part, I see Python as an extension of C, and I would expect to have little trouble working in either language, or in the combination. By contrast, I don't really like Java. I've recently been working on a Java project, and I can confirm this dislike. But, I acknowledge that Java should be considered as an alternative. Part of what I don't like about Java, and what may be an insurmountable impediment to Python as well, are the graphics and user interface systems. Although I've used a bunch of nondenominational graphics and GUI systems over the years, I have never found one that works nearly as well as whatever happens to be native to the underlying hardware and OS. Furthermore, while I have used many different languages (including ones like APL, Lisp, and Logo) I have never found one that (in my not so humble opinion) did anything much better than C (although C++ and ObjC seem to be useful for UIs). Nevertheless, I could see myself becoming serious about Python on a permanent basis, but mostly because I view it as a logical extension of Unix and C. Also, I know that I could interface with the Mac-native graphics and UI if necessary.

I'm going to spend another week, at least, working with Python, and should have more info then. In the face of doubt, I would suggest trying to write some non-trivial program in Python as a test case. I would pick something that would be useful to the Venus project, and something that involved graphics and UI, but which would be useful even without a GUI. I would pick something that could be accomplished during the next month (i.e. May), as that would not represent too great of an investment. If that goes well, we could continue using Python. If not, then we can either fall back to C, or try something else.


İSky Coyote 2007