Images & Visualization

My new textbook and job

2014-05-10T13:50:00.001-05:00

So many things have happened in the last few months and I have not had time to add new entries to my blog. The first is the textbook that I co-authored with Sridevi Pudipeddi which was released at the end of February. It was marathon run to complete and get the book ready for publishing. I also moved to a new job in California in April.

During my work as image processing consultant at the Minnesota Supercomputing Institute, I have worked with students in various disciplines of science. In all these cases, images were acquired using x-ray, CT, MRI, Electron microscope and Optical microscope. It is important that the students have knowledge of both the physical methods of obtaining images and the analytical processing methods to understand the science behind the images. Thus, a course in image acquisition and processing has broad appeal across the STEM disciplines and is useful for transforming undergraduate and graduate curriculum to better prepare students for their future.

There are books that discusses image acquisition alone and there are books on image processing alone. The image processing algorithms depend on the image acquisition method. We wrote a book that discusses both, so that students can learn from one source. You can check out sample chapter of the book at reedwith.us. You can buy the book at Amazon by clicking on the image below.

I also changed job. I started working as Senior Engineer at Elekta in Sunnyvale, CA. I will be focussing mostly on x-ray and CT during my tenure at Elekta.

Installing large number of packages in R - method 2

2013-09-09T10:08:00.003-05:00

In one my previous blog, I discussed a method for installing multiple R packages from one version in another. In the post, I used a combination of R and Python. In this post, I will present a method that uses only R.

There are two functions in the file (listed below), listmultipack and installmultipack. You can copy the content of the listing below and name it 'packageinstall.r'.

The function, listmultipack reads the list of all packages in a given version and writes them to a file. The default file name is 'requirements.txt'. The name has been chosen to follow the Python convention. Alternately, you can name it to any other file name.

Once the new R version is installed on the same machine or a different machine, the installmultipack function can be used to install all the packages in the 'requirements.txt' or the file name that you chose. Finally the function prints all warnings that were generated during the installation.

To use the file, at the R command prompt, you need to load the file using

>> source("packageinstall.r")

To obtain the list of packages on to file 'mylist.txt', type

>> listmultipack('mylist.txt')

To install all the packages in the mylist.txt file, type

>> installmultipack('mylist.txt')

Alternately, if you do not have a previous installation of R and would like to install multiple R packages using one command, you can also create a requirements file by listing the package names in a text file. Each package name should be in a line by itself.

Checkpointing in Python

2013-02-14T16:31:00.002-06:00

Scientific programs are known to be computationally expensive. They generally require significant amount of time for processing. For example, a program might run for several hours or days. It is not always possible to guarantee that the machine in which the program is running will be available to the user over a long period. In such cases, Checkpointing can be used to store the state of the program at different times, so that the program can be restarted without the need to restart the computation from the beginning. In this post, we will discuss a method for checkpointing Python programs.

The first program shown below performs the addition of all numbers from 1 to 200 and prints the sum. This program might look too simplistic for scientific computation but it provides a good platform for discussing checkpointing. Instead of computing the sum from 1 to 200, imagine computing the same to a really high value such as trillions. In such case, the computation might run for hours. If the program is interrupted, the computation needs to be started from the beginning.

The second program is the check pointed version of the first program. In checkpointing, the current state of the program is stored as a file. Whenever a program runs, it looks for the checkpoint file, so that it can restore the program state. If it does not exist, the program assumes that it is being run for for the first time. During the computation, the program outputs its state to a file at regular interval. The exact time interval is dependent on the program being solved. If the program runs successfully, the content of the checkpoint file is no longer needed and hence removed from the disk.

In line 7-12, the program checks if the checkpoint exist and if it does, reads its content and stores it in the variable start and total_sum. If the checkpoint file does not exist, it applies a default value. In the process of computation, the current state of the program is output to the checkpoint file (checkpt_file) every 5th iteration. The sleep statement is added to slow the execution. In this program, the state of the program is stored at every 5th iteration. If the program successfully completes, the checkpoint file is removed using os.unlink method (line 28).

The checkpoint file is written and read using Python's pickle. Thus, any Python datatype that can be pickled can be stored. Alternate formats such as hdf5, csv, xls etc can also be used to store the file. pickle was chosen as it is built in to Python and also due to the data stored being a picklable dictionary.

The image below is a snapshot after running the second program. The program was interrupted at the 7th iteration. When the program was restarted, using the command line, it begins with iteration 6 as the state of the program up to iteration 5 was stored in the checkpoint file.

Installing large number of packages in R

2012-10-12T09:18:00.000-05:00

I have been learning R, more specifically install and maintain R for users. Recently, I had to install the latest version 2.15.1 from source. Once it was installed, the next step was to install the packages from the older version (2.15.0) in the latest version. Although installing packages in R is as simple as invoking the function install.packages(), it quickly becomes cumbersome when you have to install more than 400 packages.

Instead I resorted to a combination of R and Python to complete this process.

First, determine the list of all packages in the older version of R using the following commands

packs <- installed.packages()
exc <- names(packs[,'Package'])

Here, exc contains a column of package names.

Then, store this list in a text file, 'test.Rdata', so that it can be processed using Python.

write(exc,'test.Rdata')

To install, more than one package, a command similar to the one below can be used. This command will install the three packages, BiocInstaller, coda and DEGseq. However, the aim of this blog post is to describe a method to install much more than these three packages.

install.packages(c("BiocInstaller", "coda", "DEGseq"),dependencies=TRUE)

The Python script that generates this command is given below. This program reads the column of data and concatenates them after adding " and , as appropriate.

fp = open("test.RData","r")
s = 'install.packages(c('
for i in fp.readlines():
s = s+'"'+i.strip()+'",'
s = s+'),dependencies=TRUE)'
print s

Finally, copy the output of the Python program in to the R command line and wait a few hours to finish the installation.

I am assuming this is a common problem. How do you handle it? You can give your advice in the form of comments.

PS: The concatenation can be performed using any other scripting language like perl, php, bash etc.

Comparing Matlab and Python for large image processing problems

2012-03-03T18:46:00.001-06:00

Matlab is a popular scripting language for numerical computing. It is popular and powerful due to its various toolbox. Since Matlab is developed commercially, dedicated programmers are working in adding new features and enhancing the existing ones all the time. Large image data sets created using latest imaging modalities typically take a long time to process. The typical approach is to run the processing in parallel on many cores either on a desktop, a cluster or a supercomputer. Commercial software like Matlab require dedicated license such as Distributed Computing Server for running such parallel jobs, which has an added cost.

Graphical Processing Unit (GPU) programming is becoming popular and easier than ever. The license cost for GPU programming in Matlab is typically lower than that for parallel programming. GPU programming is useful, if there is large amount of computation and fewer data transfer. This limit exists because of smaller bandwidth between the CPU and GPU, the typical path that data has to take for processing in the GPU. Large data set image processing typically involves large amount of I/O and data transfer between CPU and GPU and hence may not provide enough scalability.

Advantage Python

Python is a free and open source scripting language. It can be scaled to large number of processors. It has been shown that there is not a significant difference in computational time for a program written in Matlab vs the one written in python using numpy. The author found that numpy run time is in the same order of magnitude as Fortan and C programs with optimization enabled. Although, the processing time are for slightly older versions of the various languages, my experience has shown that the range of processing time remains similar. The processing time will also vary depending on the nature of the algorithm and the expertise of the programmer.

Python has parallel processing modules like mpi4py, that allow scaling the application to large number of cores. In a supercomputing environment with large number of cores, python can be run on many of the cores. Matlab on the other hand can only scale to the extent of the number of license.

Python also has GPU programming capability through pycuda, in case if GPU programming suits the application.

Python is a more general purpose language compared to Matlab. Hence integrating various databases, servers, file handling, string handling in to the program is easy.

Disadvantage Python

Since Python is an open-source package, it does not have a wide variety of canned functions and toolboxes as Matlab. Hence, the user has to work on developing some of them. I hope that over time this issue will be resolved.

Under-graduate education in Image processing

2012-02-17T19:14:00.001-06:00

SURVEY URL: http://goo.gl/ORDDz

Image acquisition and processing have become a standard method for qualifying and quantifying experimental measurements in various Science Technology Engineering and Mathematics (STEM) disciplines. Discoveries have been made possible in medical sciences by advances in diagnostic imaging such as x-ray based computed tomography (CT) and magnetic resonance imaging (MRI). Biological and cellular functions have been revealed with new imaging techniques in light based microscopy. Advancements in material sciences have been aided by electron microscopy analysis of nanoparticles. All these examples and many more require both knowledge of the physical methods to obtain images and the analytical processing methods to understand the science behind the images.

Imaging technology continues to advance with new modalities and methods available to students and researchers in STEM disciplines. Thus, a course in image acquisition and processing would have broad appeal across the STEM disciplines and be useful for transforming undergraduate and graduate curriculum to better prepare students for their future.

Image analysis is an extraordinarily practical technique that need not be limited to highly analytical individuals with a math and engineering background. Since researchers in biology, medicine, and chemistry along with students and scientists from mathematics, physics and various engineering fields use these techniques regularly; there is a need for a course that provides a gradual introduction to both acquisition and processing.

Such a course will prepare students in the common image acquisition techniques like CT, MRI, light microscope and electron microscope. It will also introduce the practical aspects of image acquisition such as noise, resolution etc. The students will also be programming image processing using Python as a part of the curriculum. They will be introduced to python modules such as numpy and scipy. They will learn the various image processing operations like segmentation, morphological operations, measurements and visualization.

We wanted our understanding of this need with the data collected from surveying people interested in image processing or people who wish that they had such a course during their senior year in under-graduate or in graduate school. We created a survey to obtain your feedback. It will take only a minute of your time. We request that you fill as much information as you can. Please forward this URL or this blogpost to your friends as well.

Python modules for scientific image processing

2012-02-12T21:37:00.003-06:00

Recently my friend, Nick Labello working at University of Chicago performed a large scale, large data image processing. It took one week to process the data acquired over 12 hours. The program was written in Matlab and was run on a desktop. If the processing time needs to be reduced, the best course of action is to parallelize the program and run it on multiple cores / nodes. Matlab parallelization can be expensive as it is closed-source commercial software. Python on the other hand is free and open-source and hence can be parallelized to large number of cores. Python also has parallelization modules like mpi4py that eases the task. With a clear choice of programming language, Nick worked on evaluating the various python modules. The chosen python module was used to rewrite the Matlab program. Specifically, he reviewed the following

Numpy

Scipy

Pymorph

Mahotas

Scikits-image

Python Imaging Library (PIL)

Nick’s view on the various modules along with some of my view is given below.

Numpy

It doesn't actually give you any image processing capabilities, but all of the image processing libraries rely on Numpy arrays to store the image data. As a happy result, it is trivial to bounce between all of the different image processing libraries in the code because they all read and write the same datatype.

Scipy NDImage

It provides a stable, solid, easy to use very basic functionality (erosion, dilation, etc..). It is missing a lot of the more advanced functions you might find in Matlab such as functions to find endpoints, perimeter pixels, etc. These must be pieced together from elementary operations. The documentation for NDImage is NOT very good for new users. I had a hard time with it at first.

PyMorph

It has lots of image processing functions that do not rely on anything but Numpy. It does not depend on Scipy. It provides a python only library. A second interesting thing about PyMorph is that it has quite a lot of functionality compared to the other libraries. Unfortunately, since it is written in Python, it is hundreds of times slower than Scipy and the other libraries. This will become an issue for advanced image processing users.

Mahotas

It provides only the most basic functions, but blazing fast, and, in my experience, twice as fast as the equivalent functions in Scipy.

Scikits-Image

It picks up where Scipy leaves off and offers quite a few functions not found in Scipy.

Python Imaging Library (PIL)

It is free to use but is not open-source. It has very few image processing algorithms and not necessarily useful for scientific imaging.

Enthought

It is __not__ an image processing library. It is a python distribution ready for scientific programming. It has over 100 scientific packages pre-installed, and is compiled against fast MKL/BLAS. My tests were a little bit faster in Enthought-python than the regular Python by 2-3%. Your mileage may vary. The big advantage, though, is that it has everything I need except Mahotas and PyMorph. Also, Mahotas installs easily on Enthought, whereas it was difficult on the original python installation, due to various dependencies. Enthought is free for students and employees of Universities and Colleges. Others need to pay for the service.

Ravi's view: Personally, I do not have much problem installing python modules to regular python interpreter. The only time it becomes difficult is installing scientific packages like Scipy that have numerous dependencies like Boost libraries. Readymade binary packages do eliminate the installation steps but are not necessarily well tuned for optimal performance. Enthought is a great alternative that is as easy to install and yet optimized for performance.

Plotting three variable graph using Matlab

2012-02-08T22:58:00.001-06:00

Recently, a user wanted to visualize the effect of four different test conditions causing changes in three different parameters. This visualization will help understand the effect of change in one parameter on others.

The user suggested to plot the three parameters along three different axis. For example, the three parameters with values of [95.0, 1.2, 4.5] will correspond to the co-ordinates of [95.0, 0.0, 0.0], [0, 1.2, 0] and [0, 0, 4.5]. Using these coordinates, one can form a triangle. The shape of the triangle will be different for the various test conditions and it will be easier to visualize the effect of the test conditions on the parameters.

I was not initially sure whether I could accomplish this using a standard Matlab plots. I did a search on google but was not successful, as I did not have a good search term. I resorted to creating a OpenGL program using GL_TRIANGLES. I later found that Matlab has similar functionality. Triangles and other polygons can be easily constructed in Matlab using the "patch" function.

The program

The vals contains the value that needs to be plotted. Each column is one test condition and the rows contain the parameters that need to be plotted along the axis. The "for loop" runs for each column and creates the x, y and z coordinates and stores them in a, b and c. The patch command will create a triangle using the three coordinates. The last parameter in the patch command is the color of the patch. By default, patch is rendered with all surface opaque and the color specified in the patch command. Since there are too many surfaces, the patch was made transparent and the edges were given different line style and thickness using "plottools". The resulting plot can be seen below.

% The variables vals has three rows and four columns. The columns contain co-ordinate values along x,y,z axis respectively. The four rows will result in four triangular surfaces.

vals = [11.11,3.55,4.97,2.14;
         2.61,4.63,4.53,4.66;
         91.2,79,93.4,56.3];

figure;
hold on;
for i = 1:size(vals,2) % For each column in vals
   a = [vals(1,i)     0            0]
   b = [0         vals(2,i)        0]
   c = [0             0       vals(3,i)]
   patch(a,b,c,10*i)
end
grid on;

Calculating the number of repeating objects in an image

2010-10-22T14:23:00.023-05:00

Recently I was asked to help solve a problem determining the number of seats in an airline. An example of such an airline layout is shown below (click to view the bigger image.)

I decided to use the method I knew best, Cross Correlation. The idea is to cross correlate the template of the seat (i.e., an image of a single seat) with every pixel in the airline layout image. The template (coordinate origin is its center) is moved over to a particular pixel in the image. The cross correlation coefficient is calculated and the value is used as the intensity of a new image. This is repeated by moving the template to every pixel in the airline layout image. The pixels for which the template matches perfectly with the airline layout image, will have the correlation coefficient close to 1.

Above: A hard to see template of the seat.

The results of the cross correlation is shown below (click to view the bigger image.)

The bright spots in this image are the points with the highest correlation. It then becomes a simple process of segmenting the high intensity pixel.

To perform these operations, my natural choice was Matlab and its Image processing toolbox.


im = imread('airline_seating.jpg');
im = rgb2gray(im);

im_template = imread('template.jpg');
 im_template= rgb2gray(im_template);

C = normxcorr2(im_template, im);
C1 = C>0.7;

stat = regionprops(C1);
noofseats = size(stat,1);
disp(['Number of seats = ',num2str(noofseats)]);

In the first 4 lines of code, we read the airline layout image and the template image. To obtain the correlation image, I did not have to write my own correlation function instead Matlab has one already ready to be used. This function, normxcorr2 needs the airline layout and the template matrix. Once the correlation image is obtained, we segment it based on the logic that any pixel with value more than 0.7 is considered as pixels corresponding to the center of seat. Since the center of the seat did not segment as a single pixel, I could not count the number of pixels as the number of seats. Instead I calculated the number of regions using regionprops and store it as a structure. The number of elements of the structure is the number of seats.

Spamming of HTML forms - one case

2010-10-06T13:22:00.010-05:00

Recently I found that a newspaper in its online edition switched from image based CAPTCHA system to solving a mathematical puzzle, in-order to prevent spamming of their comment section using a computer program. A screen capture of the same can be found below.

The problem with such a system is that they can be easily solved using a computer, which defeats the purpose of using it to differentiate human and computers apart. To test my own skill, I wanted to write a program that can download the page, read it and solve the puzzle as well. Using the information I obtain, I could then post comments without human intervention.

To accomplish this task, I used the usual suspects like python and and the HTML parser, BeautifulSoup. BeautifulSoup reads a string of html or xml and converts it to a tree. Using the tree, it is easy to navigate through the tags or search for a particular one based on id or name. It is also powerful enough to differentiate tags based on CSS class in html tags.


1. import urllib
2. from BeautifulSoup import BeautifulSoup
3. import string,re

4. doc = urllib.urlopen('http://www.somesight.com/comment/reply/1854565').read()
5. soup = BeautifulSoup(''.join(doc))

6. a = soup.findAll("span",{"class":"field-prefix"})
7. b = a[0].contents[0].split("=")[0].split("+")
8. c = [int(bs) for bs in b]
9. captcha_response = sum(c)
10. print a,captcha_response

11. token1 = soup.findAll("input",id="edit-captcha-token")
12. token1_val = token1[0]['value']
13. print token1,token1_val

The two important information that I need to calculate are the captcha_response which is the solution to the mathematical problem and the captcha_token, a hidden html field in the webpage. Line #6 searches for a class, field-prefix in span tag. This tag contains the string for the mathematical puzzle that needs to be solved. I obtain the contents of this string and split it in-order to obtain the individual numbers in a list. Finally I convert those numbers from string to integer in Line #8 and sum them using line #9.

Line #11 searches the hidden captcha token, stored in the input tag with id="edit-captcha-token".

Armed with these two information, we can post any name and comment to the form. The comments were moderated but it would still require lot of human intervention to clear the spams.

I informed the webmaster of this issue. They have since moved to a image based system. I removed all reference to the site in this blog post and program in-order to keep their anonymity.

Ajax in django

2010-09-21T14:16:00.026-05:00

Asynchronous Javascript and XML (Ajax) has become the most important tool for web developers in improving the look and feel of their website. A normal HTML request is sent by a client with request for a particular page. The server prepares the page and sends it to the client browser that renders the page. This approach is still valid for most pages. But the problem is that the whole page gets rendered every time the server processes a request. This might be acceptable for some websites but sites like google maps, stock ticker sites would rather modify a small portion of the page rather than reload them completely. Ajax plays an important role in such applications.

Django provides support for Ajax and my google-fu turned up a few of them that were really good. The problem I had with them was that the sites focused more on how syntactically Ajax can be executed in Django rather than the overall principle. So, in this post I will focus just on the basic principles of Ajax in Django. The anatomy of a typical django request can be summarized by the image below. The request comes in from the client, the view function processes it and the response is returned using a call to HttpResponse or render_to_response etc.

In a normal django request, the response is in the form of HTML or CSV or some such file. Ajax needs JSON or XML instead. Hence django is almost ready for Ajax. The only extra step is the need to serialize the data in to JSON or XML format. Django makes it easy by providing serializers that are builtin.

During the process of serializing, django takes two approaches. The first approach is for serializing list of django model objects and the second approach is for everything else. The everything else could include any dictionary of values that you create but need to pass through serialization. Lets look at each approach separately. Both approach are serialized using simplejson built in to django. In the former, django provides a separate module and hence you do not have to make an explicit simplejson call. In the latter, you need to make explicit calls to simplejson functions.

In the first approach, we pass the list of django model objects to the serialize function in the serializers module.


 from django.core import serializers
 
 def search(request, query):
  return serializers.serialize("xml",User.objects.filter(lastname__startswith=query))

In the second approach, we need to make a explicit call to simplejson. In the program below, we first load the simplejson module. We then create our dictionary using the values, number and image_loc. We then return the json value using simplejson.dumps() function and django's HttpResponse function. The javascript on the django template will then modify the HTML page according to the values in the json object. If you use the first approach to the dictionary that you created, it will give an error, "'str' object has no attribute '_meta'"


from django.utils import simplejson

def servevalues(request):
    number = 10000
    image_loc = 'http://www.google.com/images/logos/ps_logo2.png'
    data = {"number":number,"image_loc":image_loc}
    return HttpResponse(simplejson.dumps(data),mimetype='application/javascript')

Remember:
1. Ajax calls to django are very similar to http request calls. Both return some form of textual information. Hence syntactically they look alike

2. There are 2 approach to serializing. The first approach is for django model objects and the second approach is for any other object / dictionary that you create. In either case, the code that you have to write is minimal and simple, the django way!

Keep tab on items you lent using borrow-err.com

2010-07-16T11:34:00.011-05:00

On my free time, I have been working on a small but useful and interesting project. It resulted in the site http://www.borrow-err.com/

The reason for creating the site was my forgetfulness. I lend books to others and I forget about it. So, I decided to create a website where I can keep track of items I lent to others. If you would like to try it, just key in the details of the items and the name of the borrower in to the home page. The website will then send you a reminder email every month for the items you lent. If you are lucky and you get your items back, you can remove them from the list using the links provided in the email.

You can access it using any browser. The page is also light enough that it loads fine in mobile browsers as well. So whether you are at home, office or on road, you can use borrow-err.com to keep track of items that you lent. Try it and give me your feedback.

Remember: borrower's err, so you need borrow-err.com

October 5th, 2010: The previous version had the bare minimum styling. Hence I made some changes and I believe the new version is more pleasing to your eye.

Did you know that ... Part 2

2009-11-21T10:48:00.006-06:00

1. Digital Imaging received this years Nobel Prize in Physics. Willard
Boyle and George Smith received their recognition for work on Charge
Couple Devices (CCD) used in range of scientific instruments including the telescope.

2. There have been many Nobel Prize awarded to people who used imaging as one of their tools of investigation

3. Hough Transform used in the detection of geometrical shapes was initially patented by Paul Hough in 1962

4. NASA Aerogel was used to collect the interstellar and comet particles. The aerogel slows the particles otherwise traveling at high velocity to a halt without heating. NASA collected images of these tracks along which the particle translated and requested help from public in determining the location of them.

5. Open shot is a free non-linear movie editor under Linux. You can clip, resize and trim movies, add transition between scenes, change speed of clips and even "Ken Burns effect".

Matlab alternatives

2009-07-08T09:43:00.003-05:00

Matlab is a powerful programming language for scientific computing. The popularity of Matlab stems from the fact that it is a high level programming language focusing more on faster development and less on syntax. It also has many tool boxes that extend the functionality that is offered by basic Matlab on to other scientific and engineering domains. In spite of all these advantage, it comes with a price, the fact that we need to pay for purchasing, installing and maintaining it. Although the price we pay might be easily offset by the benefits, many Matlab like software are in existence and perform similar even if not the same set of functionality. In this blog, we will look at some of the alternatives. In all these alternatives, the program written to be run on Matlab can be run directly, within the constrains of functionality offered by that particular software.

FreeMat
FreeMat is a open source software released under GPL. It is available for most of the common OS like Windows, Linux and Mac. Installing the software on a Windows machine was a piece of cake. Its repertoire of functionality is large but does not have many of the toolboxes offered by Matlab.

Octave
Octave is another software available under GPL. The syntax of Octave commands are similar to Matlab (and not the same) and is also highly extensible using C, C++ and Fortran. Unlike FreeMat, Octave comes with many toolboxes for Image processing, Signal processing, Statistics etc. Installing the software on a Windows machine using cygwin is easy but installing from source on Linux machine requires expertise due to many dependencies.

Numpy and Scipy
Numpy and Scipy are Python modules that can be used to perform high level scientific computation like Matlab. Since these modules are based in python, their syntax is not similar to Matlab but due to its focus on high level programming, the syntax is generally simple. With installation of dependencies like python imaging library (PIL), we can perform image and signal processing. Scipy can also be used to perform optimization and statistics. Prebuilt binary packages are easy to install but installation from source is cumbersome due to its dependencies.

Python and Abaqus

2009-02-19T13:19:00.015-06:00

Recently I had the opportunity to work with a student who needed to perform a Finite Element Analysis on roughly 400 files using Abaqus. Processing such large number of images using the graphical user interface would have been impractical. We were happy to learn that Abaqus has a python scripting module which could help us with the automation.

We set about writing the python program from the individual function calls given in their manuals. But considering the scope of the software and its complexity, the approach quickly became difficult. Instead, we resorted to creating "macros" and modifying it to our purpose.

Creating Macros

The macros in Abaqus lets you perform a series of operations and record them as python scripts. The scripts by default are stored in "abaqusMacros.py" with each macro recorded as a function. Since the macro was created for a particular model, we modified the names in the function " Entire_Work_Flow" to be generic so that other models can be loaded. We then added other functions that will call the function created using macro.

The other function created were "getvalues" that obtains the relavant von mises stress values from the ODB file. The main function reads each of the solid model (.sat file) in a given directory and passes the filename to the macro function. It then calls getvalues function and stores the result in a CSV file for further analysis.

To run the script, type "abaqus cae -noGUI scriptname.py".

The Entire_Work_Flow has been trimmed to show only the relevant lines that corresponds to creation of the name of parts, instance and job name.

Abaqus_python - Free Legal Forms

New Line command

2009-01-26T16:04:00.014-06:00

Once in a while, I come across a task that i need to perform, may be using a python script and then it amazes me to find a tool built in to Linux / Unix.

I had a series of numbers (stored as a column) representing a physical quantity measured with respect to time. The data was stored as a text file. I had to add the time (in seconds) column before the column in the file. The first row had time of 1 second, the second row had a time of 2 seconds etc. In short, I just needed to add the line number to each row.

Ordinarily, one could use python to open the file, read each row and add a incremental number in front of each row. But with Linux all that I had to do was

nl file1.txt > file2.txt

where 'nl' is the new line command. The file 'file1.txt' contained the one column of numbers, the measured values and 'file2.txt' will contain two columns, the seconds column and the datacolumn.

The new line command can also work across pages, can number headers, footers etc. Please check the man page for more details.

Dec 28, 2009
I found a second method to add line number:

To add line number including blank line cat -n file1.txt > file2.txt. If you wish to add line numbers only for non-blank lines, use cat -b file1.txt > file2.txt

xargs - taking output of one command and making input for another

2009-01-23T09:52:00.007-06:00

If you would like to take the output of one command and pass it on to another UNIX program, you are in luck. Like many things in Unix/Linux, there are many ways to perform this operation.

For example if I need to find all jobs that are running in the queue and get complete details, I could perform either

qstat -f `qstat | grep R | awk '{print $1}'`

or

qstat | grep R | awk '{print $1}' | xargs -n1 qstat -f

In the first version, the command within `backticks` is evaluated to obtain the list of jobs that are "running". In this command, 'qstat | grep R' will give a list of rows for jobs running. The awk command splits that string and obtains the list of job names.

In the second version, 'qstat | grep R | awk '{print $1}'' is evaluated and the output is passed to xargs command. Both these commands are equivalent, except that the second option is more robust in handling whitespace and null character in output before it becomes an input to the next command.

We will continue further with the use of xargs and calculate the total time all the currently running jobs would need.

qstat | grep R | awk '{print $1}' | xargs -n1 qstat -f | grep Resource_List.walltime | awk '{split($3,a,":");sum+=a[1]}END{print sum}'

As seen earlier, the output of xargs command gives the input to 'grep Resource_List.walltime' command. The output of this command is the rows that contain information about the wallclock time for each of these jobs. These rows are then parsed to obtain just the 3rd column ($3 in the command) which contains wallclock time formatted as hh:mm:ss.

This string is split across ":" and the first number, namely the hour, is obtained as "a[1]" in the command. This process is repeated for each of the job and at end the total sum is printed.

I will keep posting more of such tidbits in the future ....

Creating module files

2009-01-22T08:53:00.007-06:00

A complex program in Linux will generally be installed in multiple locations. For example, a C++ library like Magick++ after installation will require include files, library files and binary files. Each of these is located in different folders and may be in different parent directories aw well.

In one of our installation, Magick++ is installed in /usr/local/magick++/magick++/. Under this directory, the include files are in include, the library files are in lib and binary files are in bin.

For a user defined program using the Magick++ library, the path to all three have to be in their environment variable. This is set using a module file.

A module file is a TCL script. An example of the module file for adding path to magick++ is given below.


#%Module######################################################################
##
##  Magick++ Module
##
proc ModulesHelp { } {
    puts stderr "\tThis module adds PATH that allow you to compile Magick++
}

set MAGICK_LIB_HOME "/usr/local/magick++/magick++/lib"
set MAGICK_BIN_HOME "/usr/local/magick++/magick++/bin"
set MAGICK_INCLUDE_HOME "/usr/local/magick++/magick++/include"

append-path PATH $MAGICK_BIN_HOME
append-path LD_LIBRARY_PATH $MAGICK_LIB_HOME
append-path LD_INCLUDE_PATH $MAGICK_INCLUDE_HOME

The module file begins with #%Module, which helps to identify a module file. The proc ModulesHelp prints a helpful message whenever "module help magick++" is typed in linux command prompt. The next three lines create variables that store the location of lib, bin and include direcory. Finally these paths are appended to the environment variables PATH, LD_LIBRARY_PATH and LD_INCLUDE_PATH respectively.

To invoke this module file and attach all these path to environment variables, type "module load magick++" or "module add magick++" at the Linux command line.

To unload these path and environment variables, type "module unload magick++"

In addition to appending path, we can also prepend paths, set and unset environment variables, set and unset aliases etc. Refer to the manpage for more details.

Configuring Make file

2009-01-22T08:30:00.008-06:00

Any of us who have used Linux will eventually end up installing softwares from source. The most common method for installing softwares written is C, C++, Fortran etc is using Makefile.

The Makefile contains the list of commands that will be used to create the various libraries, binaries etc.

In the most simplest of the scenario, the installation will involve

./configure
make
make install
make clean

The first step prepares the makefile with all relavant configuration depending on the system on which it is being installed. This could include, the location where the files will be stored after installation, the type of CPU etc. The second step "make" compiles and "make install" builds and places the program in the appropriate locations. "make clean" clears any temporary files that have been created.

Depending on the different scenarios and type of software being installed, different configuration may have to be set. In the example below, we will configure the installation of magick++ (a C++ library for ImageMagick) so that it is installed in /usr/local/magick++/magick++/ instead of the default location /usr/local/

./configure --with-quantum-depth=8 --prefix=/usr/local/magick++/magick++/ --exec-prefix=/usr/local/magick++/magick++/

--prefix is the location where the lib and include files will be stored after compiling. If not specified, it will be assumed as /usr/local

--exec-prefix is the location where bin files will be located.

By default, --prefix = --exec-prefix

There are many other configuration parameters that can be set, which again depends on the software. I will keep posting more such configuration.

Spell checker in Unicode using Python

2009-01-03T17:38:00.025-06:00

I have a project where I had to perform spell check on characters recognized using an optical character recognition program (OCR). My first choice was to search for an existing program written preferably in python, my favorite choice for such work. You can download the complete file here.

Amazingly I found this work by Peter Norvig . It was very well documented and well written piece of code.

But I had few issues that I needed to fix and so I could not use it directly.

1. In my program, unicode characters need to defined as the default character for all input and output unlike peter's program which works on ascii.

This is performed in the following code

#!/usr/bin/python -Wall
# -*- coding: utf-8 -*-

import re, collections, pprint,os
import sys
import codecs

if __name__ == '__main__':
...
reload(sys) sys.setdefaultencoding('iso8859-1')

2. The list of alphabets will also include the unicode characters applicable in my situation like
alphabet = u'abcdefghijklmnopqrstuvwxyzàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿß'

3. The unicode feature of python is smart enough to recognise the right characters for conversion from upper case to lower case. All that needs to be done is to call the .lower() function on any unicode characters in the following function.

def words(text): return re.findall(u'[abcdefghijklmnopqrstuvwxyzàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿß]+',text.lower())

4. Peter's program trains different words by determining the probability of its occurence. In simple terms, it counts the number of time a word appears in a standard piece of text. The larger the piece of text, the more representative it is to the real world. This scenario was not true in my case, as I do not have a piece of text where a word gets repeated multiple times.

In my case, I have a list of words in a text file. Almost all the word gets repeated only once and not any more. So the rank of a word was not in frequency but its ordinality.

The ord function in python returns the unicode position of a character input. In the function below, I first determine the ordinality of each word in the possible candidates (i.e., the original set of words). Then the ordinality of the word to be spell checked is also found. The difference between the two ordinalities is determined and the location of the lowest value gives the location of the correct word in the candidates.

def best_candidate(candidates,word):
     clist = list(candidates)
     #Find ordinality for the complete list
     so = []
     for cl in clist:
         sum_ord = 0
         for c in cl:
             sum_ord = sum_ord+ord(c)
         so.append(sum_ord)

     #Find ordinality of the given word
     sum_ord = 0
     for c in word:
         sum_ord = sum_ord+ord(c)

#Find difference in ordinality and also lowest value location
     so_item_l = []
     for so_item in so:
         so_item_l.append(abs(so_item-sum_ord))
     min_loc = so_item_l.index(min(so_item_l))

     return clist[min_loc]

Convert a series of Matlab figure files in to jpg

2008-07-08T10:12:00.008-05:00

This program converts a set of fig files located in a directory in to jpg files. The program runs through the directory and gets a list of fig file names. These files are then read using the ‘openfig’ command and the output is written using ‘saveas’ command. For example, a file named ‘image.fig’ will be read and saved as ‘image.fig.jpg’. The only variable that needs to be changed is the figdirectory

Bonus: You can use the parsing directory structure to read any kind of files located in a directory.

%%Matlab program - fig2jpg figdirectory = 'C:\Documents and Settings\username\Desktop' fullpath = sprintf('%s/*.fig',figdirectory) d = dir(fullpath); length_d = length(d) if(length_d == 0) disp('couldnt read the directory details\n'); disp('check if your files are in correct directory\n'); end startfig = 1 endfig = length_d for i = startfig:endfig fname = d(i).name; fname_input = sprintf('%s/%s',figdirectory,fname) fname_output = sprintf('%s/%s.jpg',figdirectory,fname) saveas(openfig(fname_input),fname_output,'jpg'); end



Setting automatic jobs in Linux
2008-05-13T21:53:00.003-05:00
I was working on setting up a cron job for monitoring the machines under my control and so the natural thing was to set a cron job in Linux. I was new to crontab command in Linux but as you will soon see, the command is very simple and hence should be easy to master. There are some nuances that I learnt during the course of the work and I will discuss it at the end.

The crontab command begins with setting the time, date at which a particular job needs to be run. These parameters are set using a crontab file, which can be edited by calling the command crontab -l. In the editor, type the following command

5   *   *   *   *   command-to-execute > logfile.log 

The number 5 represents the minute at which the command has to be executed. The subsequent values represents the hour, day, month and day of the week (Sunday = 0) respectively. The value * represents all the possible combination. The > pipes the output of the command to a logfile. Once the crontab file has been set, it can be verified using crontab -l. The crontab file can also be written to a text file and loaded using crontab cronfile.txt. The content of the cronfile.txt is same as above.

If you wish that you receive a mail every time the cron runs, add the following line first

MAILTO: johndoe@email.com (Of course, replace it with the correct email address)

So far, everything is easy and good but be careful with the following,

1.Ensure that there is an empty line after the last line of crontab file.
2.For some reason, crontab cannot use the PATH stored by Linux. So ensure that all commands in the crontab file and all the commands and scripts called by crontab are referred with full path.
    
   So instead of calling
           *   2   *   *  *  python     /path/to/file/pythonfile.py

call
  *  2   *   *   *   /usr/bin/python    /path/to/file/pythonfile.py

Also make sure that all the path in the python file are also referred using full path.


Optical Character Recoginition (OCR) using Tesseract
2008-04-09T15:37:00.001-05:00
Recently I had a very interesting project from client who wanted to extract the text (in my case numbers) embedded on thousands of images. My natural choice was to use Tesseract, as it can be scripted and applied to many images in sequenence. 

Tesseract is an OCR software, originally developed by Hewlett Packard and currently developed by Google. It is a open source software released under Apache license. Since it is open source, you can get your hands on it and install it on pretty much any operating system. I installed it on a Windows machine under Cygwin and the installation was a breeze. 

Tesseract does not have any segmentation methods, no document layout and can only output the recognised text to a file. But its accuracy is good enough for many applications. It was ranked among top 3rd OCR software for the year 1995. Making a call to tesseract is also easy

tesseract data.bmp text.dat [-l langid]

The values within [] are optional. The langid is the the language being recognized. The default language is English. But it also supports French, Italian, German, Spanish. 

Since Tesseract does not have any segmentation methods to separate the text from background, the user have to apply these methods using other softwares like ImageMagick, ITK etc. The most common segmentation technique for scanned documents is the Local Adaptive Thresholding. It takes in to account the variation in background intensity across the scanned image and thresholds accordingly. But the right technique has to be chosen depending on the type of image being recognized. 

Possible additions to Tesseract
All the text that I needed to decipher in my images were numbers but Tesseract does not have a langid for numbers. Since there are no langid for numbers, Tesseract deciphered some of these numbers as alphabets. If I have time in the future, I will work on creating the langid for numbers as it will be helpful for many people. If you find that it might be helpful for you, I encourage you to create one or contact me and we can work togther.


Summary of all chapters
2008-03-27T16:06:00.002-05:00
Currently I am busy with my other projects and I am not able to continue working on chapters on Draw and Pixel class and also on other useful small projects that you could perform. So meanwhile, I am uploading a file which has all my writing thus far. Enjoy reading.


Magick++ chapter 1 to 6  - Get more free documents


Chapter 6: Image class – Part 2
2008-03-24T08:59:00.001-05:00
In chapter 4, we looked at the overview of the Image class and discussed a few  functions, for filtering images, in detail using some example program. In this chapter, we will discuss a higher order image processing function like segmentation. We will first demonstrate a program for performing adaptive thresholding and then a program for performing fuzzy c  mean clustering. Each of these algorithms will be explained and then the program will be illustrated to ensure complete understanding. 


Continue reading the pdf file ...


Chapter 6: Image class – Part 2 - Get more free documents