Showing posts with label xargs. Show all posts
Showing posts with label xargs. Show all posts

May 19, 2010

an elegant script for multithreading bash execution

ok, we are in 2010 and we have (at least) a duo-core (i hope ;D)

recently, i buyed my first NEW computer (after ten years i've spended in building my computers, assembling used hardware... buyed on ebay or similar... now i'm too old to this.. and i buyed a _complete_ case ;D)
...a ht-quadcore i7 with 6gb ram ;D
and now i want all my sw/script/etc in multithreading.. damn, i have an optocore and i want use it!
so, it's a good chance for xargs.
in example, some days ago i encoded from dv to theora (11 files):
- sequential solution (11 iteration, sequentially):
for DV_FILE in `ls -1 *dv`;
do
 ffmpeg2theora -f dv -o ${DV_FILE/%dv/ogv} $DV_FILE
done
- "parallel" solution (11 parallel processes - if u want your cpu(s) start to cry...):
for DV_FILE in `ls -1 *dv`;
do
 # just added & to command, to send in background
 ffmpeg2theora -f dv -o ${DV_FILE/%dv/ogv} $DV_FILE &
done
now, an inline solution with xargs: supposing we have a script "dv2theora" as:
#!/bin/sh
DV_FILE=$1
ffmpeg2theora -f dv -o ${DV_FILE/%dv/ogv} $DV_FILE
and dv files are named as 1.dv 2.dv etc...
then we have
echo `seq 1 11` | xargs -n1 -P3 ./dv2theora
xargs takes argument from pipe and gives them to command "dv2theora" - option P3 means "3 parallel processes"

but.. where is the problem?
oh, it's simple - xargs handles substitutions as find, with {} - bad, very bad.

so, there is just a better solution:

links: mdo ("mdo" means MultipleDOing) or copy and past from below:
#!/bin/sh

# number of cpus/threads u want at same time
SMP_PARAMETER=3

# do not modify below
[ $# -lt 1 ] && echo "Usage: mdo    " && exit 1
argv=("$@")
COMMAND=${argv[0]}
unset argv[0]
echo ${argv[@]} | xargs -n1 -P$SMP_PARAMETER $COMMAND
and now i can use as below:
mdo dv2theora `find . -type f -iname '*.dv'`
just more intuitive
just simpler

enjoy it ;D