Wiki‎ > ‎

GNU Parallel Script for Job Submission

posted May 19, 2011, 4:20 PM by Dong Xu   [ updated May 26, 2011, 1:46 PM ]
by Matthew Caylor

Usage:
  • parallel.sh pipe  [log[=<file>] [append]] [mydir[=<dir>]] [: <arguments for paralle >] :: <Program to run with its arguments>
  • parallel.sh count=<positive integer> [log[=<file>] [append]] [mydir[=<dir>]] [: <arguments for parallel >] :: <Program to run with its arguments>
  • parallel.sh dir[=<directory>] [log[=<file>] [append]] [mydir[=<dir>]] [: <arguments for parallel >] :: <Program to run with its arguments>
  • parallel.sh file[=<file path which has one or no *s in it>] [log[=<file>] [append]] [mydir[=<dir>]] [: <arguments for parallel >] :: <Program to run with its arguments>


parallel.sh does two things automatically, first it regenerates ~/.parallel/sshloginfile in case of the lab computers is down. The second thing is that it calls parallel with the sshloginfile set to .. (which parallel interprets as ~/.parallel/sshloginfile) and max load set. (The current value is set to 0.8)

pipe - pipe is a keyword used to make the script to pass STDIN into gnu parallel in the same way that parallel does normally.

count - This keyword will spawn a number of jobs as stated.

dir - Dir will take the current directory or any specified directory and run parallel on every file in that directory.

file - File will do the same as dir, except it allows you to specify one or more files. You can only have one * at the end of the path.

log - This will record the output of parallel into a log file.

append - This will make the script append to the log file instead of overwriting it. log must be set.

mydir - This will set the directory that parallel runs on on remote hosts. The default is the dirrectory that parallel.sh is called from.

Parallel Arguments (:) - This optional control allows the user to pass extra arguments to parallel. --sshloginfile, -S and --load are not allowed. See the gnu parallel documentation for help.

Program (::) - This is the program to be run. The first parameter passed in is assumed to be the program and parallel.sh will auto-complete the path of the program if it isn’t in PATH.

Examples:

Call echo N times and print out the computer it ran on with the number.
    parallel.sh count=N :: echo ‘$HOSTNAME: {}’
Gzip every file in the current directory and append the results in $HOME/log.txt
    parallel.sh dir log=$HOME/log.txt append :: gzip -1
Gzip every tar file in the current directory and append the results in $HOME/log.txt
    parallel.sh file=*.tar log=$HOME/log.txt append :: gzip -1

Autodock submission examples:


Savor
parallel.sh file=*.dpf mydir :: /home/apps/autodocksuite-4.0.1/bin/i86Linux2/autodock4 -p {} -l {}.dlg

Allspice

parallel.sh local count=20 mydir :: /home/apps/autodocksuite-4.0.1/bin/i86Linux2/autodock4 -p 1adb_1adb.dpf -l 1adb_1adb.curry.{}.dlg

Beowulf
parallel.sh count=112 mydir :: /home/dxulab/autodock/autodocksuite-4.0.1/src/autodock/autodock4 -p 1adb_1adb.dpf2 -l 1adb_1adb.curry.{}.dlg2

Check job status
for i in $(seq -w 1 59); do echo node$i; ssh node$i 'ps aux|grep autodock|grep -v PARALLEL|grep -v grep|wc -l'; done

Comments