May 19, 2010

an elegant script for multithreading bash execution

ok, we are in 2010 and we have (at least) a duo-core (i hope ;D)

recently, i buyed my first NEW computer (after ten years i've spended in building my computers, assembling used hardware... buyed on ebay or similar... now i'm too old to this.. and i buyed a _complete_ case ;D)
...a ht-quadcore i7 with 6gb ram ;D
and now i want all my sw/script/etc in multithreading.. damn, i have an optocore and i want use it!
so, it's a good chance for xargs.
in example, some days ago i encoded from dv to theora (11 files):
- sequential solution (11 iteration, sequentially):
for DV_FILE in `ls -1 *dv`;
do
 ffmpeg2theora -f dv -o ${DV_FILE/%dv/ogv} $DV_FILE
done
- "parallel" solution (11 parallel processes - if u want your cpu(s) start to cry...):
for DV_FILE in `ls -1 *dv`;
do
 # just added & to command, to send in background
 ffmpeg2theora -f dv -o ${DV_FILE/%dv/ogv} $DV_FILE &
done
now, an inline solution with xargs: supposing we have a script "dv2theora" as:
#!/bin/sh
DV_FILE=$1
ffmpeg2theora -f dv -o ${DV_FILE/%dv/ogv} $DV_FILE
and dv files are named as 1.dv 2.dv etc...
then we have
echo `seq 1 11` | xargs -n1 -P3 ./dv2theora
xargs takes argument from pipe and gives them to command "dv2theora" - option P3 means "3 parallel processes"

but.. where is the problem?
oh, it's simple - xargs handles substitutions as find, with {} - bad, very bad.

so, there is just a better solution:

links: mdo ("mdo" means MultipleDOing) or copy and past from below:
#!/bin/sh

# number of cpus/threads u want at same time
SMP_PARAMETER=3

# do not modify below
[ $# -lt 1 ] && echo "Usage: mdo    " && exit 1
argv=("$@")
COMMAND=${argv[0]}
unset argv[0]
echo ${argv[@]} | xargs -n1 -P$SMP_PARAMETER $COMMAND
and now i can use as below:
mdo dv2theora `find . -type f -iname '*.dv'`
just more intuitive
just simpler

enjoy it ;D
Post a Comment