SLUG Mailing List Archives
Re: [SLUG] Clustering weirdness
- To: Steven Tucker <tuxta2@xxxxxxxxx>
- Subject: Re: [SLUG] Clustering weirdness
- From: Peter Chubb <peterc@xxxxxxxxxxxxxxxxxx>
- Date: Tue, 15 Nov 2011 21:02:48 +1100
- Cc: slug@xxxxxxxxxxx
- User-agent: Wanderlust/2.15.9 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (Gojō) APEL/10.8 Emacs/23.3 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)
>>>>> "Steven" == Steven Tucker <tuxta2@xxxxxxxxx> writes:
Steven> Hi all, got a problem with my cluster using OpenMPI + Torque+
Steven> I can submit 50 different jobs (single process) and the
Steven> batching system will run all 50 in parallel, but I cant get an
Steven> MPI job to run on more that 1 node. I assumed it must be my
Steven> pbs script, but I have tried just about every config I can
Steven> find/think of and still no luck.
I haven't used torque, but if it's anything like NQS, you need a
different batch queue that's configured with the nodes you want to be
able to use. Also typically there's a different prologue and epilogue
(differently named files) for parallel as opposed to single-node jobs.
We used to have to do something like
qmgr -c "set queue batch16 resources_max.nodect=16"
to allow jobs submitted to the queue `batch16' to use up to 16 nodes,
for instance. It's been fifteen years since I used NQS so my memory
may be faulty. And of course, Torque has its own command set
(although I believe it's based on NQS).
Dr Peter Chubb http://www.gelato.unsw.edu.au peterc AT gelato.unsw.edu.au
http://www.ertos.nicta.com.au ERTOS within National ICT Australia