Sunday, February 12, 2012

Python modules for scientific image processing



Recently my friend, Nick Labello working at University of Chicago performed a large scale, large data image processing.  It took one week to process the data acquired over 12 hours.  The program was written in Matlab and was run on a desktop. If the processing time needs to be reduced, the best course of action is to parallelize the program and run it on multiple cores / nodes.  Matlab parallelization can be expensive as it is closed-source commercial software.  Python on the other hand is free and open-source and hence can be parallelized to large number of cores.  Python also has parallelization modules like mpi4py that eases the task.  With a clear choice of programming language, Nick worked on evaluating the various python modules.  The chosen python module was used to rewrite the Matlab program.   Specifically, he reviewed the following

  1. Numpy
  2. Scipy
  3. Pymorph
  4. Mahotas
  5. Scikits-image
  6. Python Imaging Library (PIL)
Nick’s view on the various modules along with some of my view is given below.  

Numpy

It doesn't actually give you any image processing capabilities, but all of the image processing libraries rely on Numpy arrays to store the image data.  As a happy result, it is trivial to bounce between all of the different image processing libraries in the code because they all read and write the same datatype. 

It provides a stable, solid, easy to use very basic functionality (erosion, dilation, etc..).    It is missing a lot of the more advanced functions you might find in Matlab such as functions to find endpoints, perimeter pixels, etc.  These must be pieced together from elementary operations.  The documentation for NDImage is NOT very good for new users.  I had a hard time with it at first.  

It has lots of image processing functions that do not rely on anything but Numpy.  It does not depend on Scipy.  It provides a python only library.  A second interesting thing about PyMorph is that it has quite a lot of functionality compared to the other libraries.  Unfortunately, since it is written in Python, it is hundreds of times slower than Scipy and the other libraries.  This will become an issue for advanced image processing users.


Mahotas  

It provides only the most basic functions, but blazing fast, and, in my experience, twice as fast as the equivalent functions in Scipy.

Scikits-Image  

It picks up where Scipy leaves off and offers quite a few functions not found in Scipy.

Python Imaging Library (PIL)  

It is free to use but is not open-source.  It has very few image processing algorithms and not necessarily useful for scientific imaging.


Enthought  

It is __not__ an image processing library.  It is a python distribution ready for scientific programming.  It has over 100 scientific packages pre-installed, and is compiled against fast MKL/BLAS.  My tests were a little bit faster in Enthought-python than the regular Python by 2-3%.  Your mileage may vary.  The big advantage, though, is that it has everything I need except Mahotas and PyMorph.  Also, Mahotas installs easily on Enthought, whereas it was difficult on the original python installation, due to various dependencies. Enthought is free for students and employees of Universities and Colleges. Others need to pay for the service.

Ravi's view: Personally, I do not have much problem installing python modules to regular python interpreter. The only time it becomes difficult is installing scientific packages like Scipy that have numerous dependencies like Boost libraries. Readymade binary packages do eliminate the installation steps but are not necessarily well tuned for optimal performance. Enthought is a great alternative that is as easy to install and yet optimized for performance.

No comments: