Showing posts with label unix. Show all posts
Showing posts with label unix. Show all posts

Monday, January 26, 2009

New Line command

Once in a while, I come across a task that i need to perform, may be using a python script and then it amazes me to find a tool built in to Linux / Unix.

I had a series of numbers (stored as a column) representing a physical quantity measured with respect to time. The data was stored as a text file. I had to add the time (in seconds) column before the column in the file. The first row had time of 1 second, the second row had a time of 2 seconds etc. In short, I just needed to add the line number to each row.

Ordinarily, one could use python to open the file, read each row and add a incremental number in front of each row. But with Linux all that I had to do was

nl file1.txt > file2.txt

where 'nl' is the new line command. The file 'file1.txt' contained the one column of numbers, the measured values and 'file2.txt' will contain two columns, the seconds column and the datacolumn.

The new line command can also work across pages, can number headers, footers etc. Please check the man page for more details.

Dec 28, 2009
I found a second method to add line number:

To add line number including blank line cat -n file1.txt > file2.txt. If you wish to add line numbers only for non-blank lines, use cat -b file1.txt > file2.txt

Friday, January 23, 2009

xargs - taking output of one command and making input for another

If you would like to take the output of one command and pass it on to another UNIX program, you are in luck. Like many things in Unix/Linux, there are many ways to perform this operation.

For example if I need to find all jobs that are running in the queue and get complete details, I could perform either

qstat -f `qstat | grep R | awk '{print $1}'`

or

qstat | grep R | awk '{print $1}' | xargs -n1 qstat -f

In the first version, the command within `backticks` is evaluated to obtain the list of jobs that are "running". In this command, 'qstat | grep R' will give a list of rows for jobs running. The awk command splits that string and obtains the list of job names.

In the second version, 'qstat | grep R | awk '{print $1}'' is evaluated and the output is passed to xargs command. Both these commands are equivalent, except that the second option is more robust in handling whitespace and null character in output before it becomes an input to the next command.

We will continue further with the use of xargs and calculate the total time all the currently running jobs would need.

qstat | grep R | awk '{print $1}' | xargs -n1 qstat -f | grep Resource_List.walltime | awk '{split($3,a,":");sum+=a[1]}END{print sum}'

As seen earlier, the output of xargs command gives the input to 'grep Resource_List.walltime' command. The output of this command is the rows that contain information about the wallclock time for each of these jobs. These rows are then parsed to obtain just the 3rd column ($3 in the command) which contains wallclock time formatted as hh:mm:ss.

This string is split across ":" and the first number, namely the hour, is obtained as "a[1]" in the command. This process is repeated for each of the job and at end the total sum is printed.

I will keep posting more of such tidbits in the future ....