Saturday, November 21, 2009

Did you know that ... Part 2

1. Digital Imaging received this years Nobel Prize in Physics. Willard
Boyle and George Smith received their recognition for work on Charge
Couple Devices (CCD) used in range of scientific instruments including the telescope.

2. There have been many Nobel Prize awarded to people who used imaging as one of their tools of investigation

3. Hough Transform used in the detection of geometrical shapes was initially patented by Paul Hough in 1962

4. NASA Aerogel was used to collect the interstellar and comet particles. The aerogel slows the particles otherwise traveling at high velocity to a halt without heating. NASA collected images of these tracks along which the particle translated and requested help from public in determining the location of them.

5. Open shot is a free non-linear movie editor under Linux. You can clip, resize and trim movies, add transition between scenes, change speed of clips and even "Ken Burns effect".

Wednesday, July 8, 2009

Matlab alternatives

Matlab is a powerful programming language for scientific computing. The popularity of Matlab stems from the fact that it is a high level programming language focusing more on faster development and less on syntax. It also has many tool boxes that extend the functionality that is offered by basic Matlab on to other scientific and engineering domains. In spite of all these advantage, it comes with a price, the fact that we need to pay for purchasing, installing and maintaining it. Although the price we pay might be easily offset by the benefits, many Matlab like software are in existence and perform similar even if not the same set of functionality. In this blog, we will look at some of the alternatives. In all these alternatives, the program written to be run on Matlab can be run directly, within the constrains of functionality offered by that particular software.

FreeMat is a open source software released under GPL. It is available for most of the common OS like Windows, Linux and Mac. Installing the software on a Windows machine was a piece of cake. Its repertoire of functionality is large but does not have many of the toolboxes offered by Matlab.

Octave is another software available under GPL. The syntax of Octave commands are similar to Matlab (and not the same) and is also highly extensible using C, C++ and Fortran. Unlike FreeMat, Octave comes with many toolboxes for Image processing, Signal processing, Statistics etc. Installing the software on a Windows machine using cygwin is easy but installing from source on Linux machine requires expertise due to many dependencies.

Numpy and Scipy
Numpy and Scipy are Python modules that can be used to perform high level scientific computation like Matlab. Since these modules are based in python, their syntax is not similar to Matlab but due to its focus on high level programming, the syntax is generally simple. With installation of dependencies like python imaging library (PIL), we can perform image and signal processing. Scipy can also be used to perform optimization and statistics. Prebuilt binary packages are easy to install but installation from source is cumbersome due to its dependencies.

Thursday, February 19, 2009

Python and Abaqus

Recently I had the opportunity to work with a student who needed to perform a Finite Element Analysis on roughly 400 files using Abaqus. Processing such large number of images using the graphical user interface would have been impractical. We were happy to learn that Abaqus has a python scripting module which could help us with the automation.

We set about writing the python program from the individual function calls given in their manuals. But considering the scope of the software and its complexity, the approach quickly became difficult. Instead, we resorted to creating "macros" and modifying it to our purpose.

Creating Macros

The macros in Abaqus lets you perform a series of operations and record them as python scripts. The scripts by default are stored in "" with each macro recorded as a function. Since the macro was created for a particular model, we modified the names in the function " Entire_Work_Flow" to be generic so that other models can be loaded. We then added other functions that will call the function created using macro.

The other function created were "getvalues" that obtains the relavant von mises stress values from the ODB file. The main function reads each of the solid model (.sat file) in a given directory and passes the filename to the macro function. It then calls getvalues function and stores the result in a CSV file for further analysis.

To run the script, type "abaqus cae -noGUI".

The Entire_Work_Flow has been trimmed to show only the relevant lines that corresponds to creation of the name of parts, instance and job name.

Abaqus_python - Free Legal Forms

Monday, January 26, 2009

New Line command

Once in a while, I come across a task that i need to perform, may be using a python script and then it amazes me to find a tool built in to Linux / Unix.

I had a series of numbers (stored as a column) representing a physical quantity measured with respect to time. The data was stored as a text file. I had to add the time (in seconds) column before the column in the file. The first row had time of 1 second, the second row had a time of 2 seconds etc. In short, I just needed to add the line number to each row.

Ordinarily, one could use python to open the file, read each row and add a incremental number in front of each row. But with Linux all that I had to do was

nl file1.txt > file2.txt

where 'nl' is the new line command. The file 'file1.txt' contained the one column of numbers, the measured values and 'file2.txt' will contain two columns, the seconds column and the datacolumn.

The new line command can also work across pages, can number headers, footers etc. Please check the man page for more details.

Dec 28, 2009
I found a second method to add line number:

To add line number including blank line cat -n file1.txt > file2.txt. If you wish to add line numbers only for non-blank lines, use cat -b file1.txt > file2.txt

Friday, January 23, 2009

xargs - taking output of one command and making input for another

If you would like to take the output of one command and pass it on to another UNIX program, you are in luck. Like many things in Unix/Linux, there are many ways to perform this operation.

For example if I need to find all jobs that are running in the queue and get complete details, I could perform either

qstat -f `qstat | grep R | awk '{print $1}'`


qstat | grep R | awk '{print $1}' | xargs -n1 qstat -f

In the first version, the command within `backticks` is evaluated to obtain the list of jobs that are "running". In this command, 'qstat | grep R' will give a list of rows for jobs running. The awk command splits that string and obtains the list of job names.

In the second version, 'qstat | grep R | awk '{print $1}'' is evaluated and the output is passed to xargs command. Both these commands are equivalent, except that the second option is more robust in handling whitespace and null character in output before it becomes an input to the next command.

We will continue further with the use of xargs and calculate the total time all the currently running jobs would need.

qstat | grep R | awk '{print $1}' | xargs -n1 qstat -f | grep Resource_List.walltime | awk '{split($3,a,":");sum+=a[1]}END{print sum}'

As seen earlier, the output of xargs command gives the input to 'grep Resource_List.walltime' command. The output of this command is the rows that contain information about the wallclock time for each of these jobs. These rows are then parsed to obtain just the 3rd column ($3 in the command) which contains wallclock time formatted as hh:mm:ss.

This string is split across ":" and the first number, namely the hour, is obtained as "a[1]" in the command. This process is repeated for each of the job and at end the total sum is printed.

I will keep posting more of such tidbits in the future ....

Thursday, January 22, 2009

Creating module files

A complex program in Linux will generally be installed in multiple locations. For example, a C++ library like Magick++ after installation will require include files, library files and binary files. Each of these is located in different folders and may be in different parent directories aw well.

In one of our installation, Magick++ is installed in /usr/local/magick++/magick++/. Under this directory, the include files are in include, the library files are in lib and binary files are in bin.

For a user defined program using the Magick++ library, the path to all three have to be in their environment variable. This is set using a module file.

A module file is a TCL script. An example of the module file for adding path to magick++ is given below.

## Magick++ Module
proc ModulesHelp { } {
puts stderr "\tThis module adds PATH that allow you to compile Magick++

set MAGICK_LIB_HOME "/usr/local/magick++/magick++/lib"
set MAGICK_BIN_HOME "/usr/local/magick++/magick++/bin"
set MAGICK_INCLUDE_HOME "/usr/local/magick++/magick++/include"


The module file begins with #%Module, which helps to identify a module file. The proc ModulesHelp prints a helpful message whenever "module help magick++" is typed in linux command prompt. The next three lines create variables that store the location of lib, bin and include direcory. Finally these paths are appended to the environment variables PATH, LD_LIBRARY_PATH and LD_INCLUDE_PATH respectively.

To invoke this module file and attach all these path to environment variables, type "module load magick++" or "module add magick++" at the Linux command line.

To unload these path and environment variables, type "module unload magick++"

In addition to appending path, we can also prepend paths, set and unset environment variables, set and unset aliases etc. Refer to the manpage for more details.

Configuring Make file

Any of us who have used Linux will eventually end up installing softwares from source. The most common method for installing softwares written is C, C++, Fortran etc is using Makefile.

The Makefile contains the list of commands that will be used to create the various libraries, binaries etc.

In the most simplest of the scenario, the installation will involve

make install
make clean

The first step prepares the makefile with all relavant configuration depending on the system on which it is being installed. This could include, the location where the files will be stored after installation, the type of CPU etc. The second step "make" compiles and "make install" builds and places the program in the appropriate locations. "make clean" clears any temporary files that have been created.

Depending on the different scenarios and type of software being installed, different configuration may have to be set. In the example below, we will configure the installation of magick++ (a C++ library for ImageMagick) so that it is installed in /usr/local/magick++/magick++/ instead of the default location /usr/local/

./configure --with-quantum-depth=8 --prefix=/usr/local/magick++/magick++/ --exec-prefix=/usr/local/magick++/magick++/

--prefix is the location where the lib and include files will be stored after compiling. If not specified, it will be assumed as /usr/local

--exec-prefix is the location where bin files will be located.

By default, --prefix = --exec-prefix

There are many other configuration parameters that can be set, which again depends on the software. I will keep posting more such configuration.

Saturday, January 3, 2009

Spell checker in Unicode using Python

I have a project where I had to perform spell check on characters recognized using an optical character recognition program (OCR). My first choice was to search for an existing program written preferably in python, my favorite choice for such work. You can download the complete file here.

Amazingly I found this work by Peter Norvig . It was very well documented and well written piece of code.

But I had few issues that I needed to fix and so I could not use it directly.

1. In my program, unicode characters need to defined as the default character for all input and output unlike peter's program which works on ascii.

This is performed in the following code

#!/usr/bin/python -Wall
# -*- coding: utf-8 -*-

import re, collections, pprint,os
import sys
import codecs

if __name__ == '__main__':
reload(sys) sys.setdefaultencoding('iso8859-1')

2. The list of alphabets will also include the unicode characters applicable in my situation like
alphabet = u'abcdefghijklmnopqrstuvwxyzàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿß'

3. The unicode feature of python is smart enough to recognise the right characters for conversion from upper case to lower case. All that needs to be done is to call the .lower() function on any unicode characters in the following function.

def words(text): return re.findall(u'[abcdefghijklmnopqrstuvwxyzàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿß]+',text.lower())

4. Peter's program trains different words by determining the probability of its occurence. In simple terms, it counts the number of time a word appears in a standard piece of text. The larger the piece of text, the more representative it is to the real world. This scenario was not true in my case, as I do not have a piece of text where a word gets repeated multiple times.

In my case, I have a list of words in a text file. Almost all the word gets repeated only once and not any more. So the rank of a word was not in frequency but its ordinality.

The ord function in python returns the unicode position of a character input. In the function below, I first determine the ordinality of each word in the possible candidates (i.e., the original set of words). Then the ordinality of the word to be spell checked is also found. The difference between the two ordinalities is determined and the location of the lowest value gives the location of the correct word in the candidates.

def best_candidate(candidates,word):
     clist = list(candidates)
     #Find ordinality for the complete list
     so = []
     for cl in clist:
         sum_ord = 0
         for c in cl:
             sum_ord = sum_ord+ord(c)

     #Find ordinality of the given word
     sum_ord = 0
     for c in word:
         sum_ord = sum_ord+ord(c)

#Find difference in ordinality and also lowest value location
     so_item_l = []
     for so_item in so:
     min_loc = so_item_l.index(min(so_item_l))

     return clist[min_loc]