Friday, January 23, 2009

xargs - taking output of one command and making input for another

If you would like to take the output of one command and pass it on to another UNIX program, you are in luck. Like many things in Unix/Linux, there are many ways to perform this operation.

For example if I need to find all jobs that are running in the queue and get complete details, I could perform either

qstat -f `qstat | grep R | awk '{print $1}'`

or

qstat | grep R | awk '{print $1}' | xargs -n1 qstat -f

In the first version, the command within `backticks` is evaluated to obtain the list of jobs that are "running". In this command, 'qstat | grep R' will give a list of rows for jobs running. The awk command splits that string and obtains the list of job names.

In the second version, 'qstat | grep R | awk '{print $1}'' is evaluated and the output is passed to xargs command. Both these commands are equivalent, except that the second option is more robust in handling whitespace and null character in output before it becomes an input to the next command.

We will continue further with the use of xargs and calculate the total time all the currently running jobs would need.

qstat | grep R | awk '{print $1}' | xargs -n1 qstat -f | grep Resource_List.walltime | awk '{split($3,a,":");sum+=a[1]}END{print sum}'

As seen earlier, the output of xargs command gives the input to 'grep Resource_List.walltime' command. The output of this command is the rows that contain information about the wallclock time for each of these jobs. These rows are then parsed to obtain just the 3rd column ($3 in the command) which contains wallclock time formatted as hh:mm:ss.

This string is split across ":" and the first number, namely the hour, is obtained as "a[1]" in the command. This process is repeated for each of the job and at end the total sum is printed.

I will keep posting more of such tidbits in the future ....

No comments: