Saturday, March 3, 2012

Comparing Matlab and Python for large image processing problems

Matlab is a popular scripting language for numerical computing.  It is popular and powerful due to its various toolbox.  Since Matlab is developed commercially, dedicated programmers are working in adding new features and enhancing the existing ones all the time.  Large image data sets created using latest imaging modalities typically take a long time to process.  The typical approach is to run the processing in parallel on many cores either on a desktop, a cluster or a supercomputer.  Commercial software like Matlab require dedicated license such as Distributed Computing Server for running such parallel jobs, which has an added cost.

Graphical Processing Unit (GPU) programming is becoming popular and easier than ever.  The license cost for GPU programming in Matlab is typically lower than that for parallel programming.  GPU programming is useful, if there is large amount of computation and fewer data transfer.  This limit exists because of smaller bandwidth between the CPU and GPU, the typical path that data has to take for processing in the GPU.  Large data set image processing typically involves large amount of I/O and data transfer between CPU and GPU and hence may not provide enough scalability.

Advantage Python

Python is a free and open source scripting language.  It can be scaled to large number of processors.  It has been shown that there is not a significant difference in computational time for a program written in Matlab vs the one written in python using numpy.   The author found that numpy run time is in the same order of magnitude as Fortan and C programs with optimization enabled.  Although, the processing time are for slightly older versions of the various languages, my experience has shown that the range of processing time remains similar.  The processing time will also vary depending on the nature of the algorithm and the expertise of the programmer.

Python has parallel processing modules like mpi4py, that allow scaling the application to large number of cores. In a supercomputing environment with large number of cores, python can be run on many of the cores.  Matlab on the other hand can only scale to the extent of the number of license. 

Python also has GPU programming capability through pycuda, in case if GPU programming suits the application.

Python is a more general purpose language compared to Matlab.  Hence integrating various databases, servers, file handling, string handling in to the program is easy.

Disadvantage Python

Since Python is an open-source package, it does not have a wide variety of canned functions and toolboxes as Matlab.  Hence, the user has to work on developing some of them.  I hope that over time this issue will be resolved.