//////////////////////////////////////////////////////////////////// OpenFOAM performance with different implementations of MPI on a Gigabit Ethernet cluster giuseppe ciaccio, july 2007 many thanks to Henry Weller and Mattijs Janssens (OpenCFD ltd) for their patient support. //////////////////////////////////////////////////////////////////// Testbed: OpenFOAM 1.4 Cases (taken from the OpenFOAM tutorials): *) motorbike: simpleFoam case. 2.3M cells, mainly tet. *) pitzDaily3D: oodles case. Geometry from Xoodles/pitzDaily3D tutorial. 244K cells, all hex. MPIGAMMA 7 july 2007 (GAMMA 12 june 2007) MPICH2 1.0.5p4 (Nemesis channel device) OpenMPI 1.2.1 LAM 7.1.3 (find below all the config parameters) 18 nodes, each with dual-Xeon @2800 MHz 2 x Intel PRO/1000 Gigabit Eth (192.168.0.X, 192.168.1.X) Linux 2.6.18.1 gcc/g++ 4.1.1 oodles/PitzDaily3D (wall-clock time for 400 timesteps, in seconds): (#cpu x #nodes) MPIGAMMA MPICH2-nemesis OpenMPI LAM 2 x 18 436 608 600 692 1 x 18 332 509 510 495 2 x 9 521 682 691 808 2 x 8 527 674 686 812 1 x 8 709 858 853 845 2 x 4 1125 1275 1289 1415 simpleFoam/motorbike (wall-clock time for 50 timesteps, in seconds): (#cpu x #nodes) MPIGAMMA MPICH2-nemesis OpenMPI LAM 2 x 18 162 167 166 186 1 x 18 191 193 192 192 2 x 9 313 316 316 340 2 x 8 359 347 354 405 1 x 8 408 408 449 487 2 x 4 605 611 667 726 (numbers with pure GAMMA are marginally better than MPIGAMMA) GAMMA 12 june 2007, MPIGAMMA 7 july 2007 ./configure -prefix=/usr/local/mpigamma --with-arch=LINUX -cc=gcc -fc=f77 -fortnames=DOUBLEUNDERSCORE -cflags="-fPIC -fomit-frame-pointer" -fflags=-fPIC -optcc=-O3 -without-mpe -gammalib=/usr/lib/libgamma.a --with-device=gamma launch on node "bin02" job launch cmd: mpirun -np... -machinefile... machinefile (dual cpu case): bin02 bin03 bin03 bin04 bin04 bin05 bin05 ...etc etc... machinefile (single cpu case): bin03 bin04 bin05 ...etc etc... MPICH2 1.0.5p4, Nemesis channel device: ./configure --with-device=ch3:nemesis --enable-fast --disable-cxx --enable-error-checking=no CFLAGS=-O3 FFLAGS=-O3 --prefix=/home/ciaccio/mpich2-nemesis #!/bin/bash $HOME/mpich2-nemesis/bin/mpdboot -n $((1+`grep -c -v "#" $HOME/mpich2-nodes`)) -f $HOME/mpich2-nodes --ncpus=2 --ifhn=192.168.1.2 mpich2-nodes: #192.168.0.2 ifhn=192.168.1.2 192.168.0.3 ifhn=192.168.1.3 192.168.0.4 ifhn=192.168.1.4 ...etc etc... job launch cmd: mpirun -machinefile... -n... machinefile: 192.168.1.2:2 192.168.1.3:2 192.168.1.4:2 ...etc etc... OpenMPI 1.2.1: ./configure --prefix=/home/ciaccio/openmpi --disable-mpi-f90 --disable-mpi-profile --disable-heterogeneous CFLAGS=-O3 FFLAGS=-O3 CXXFLAGS=-O3 --disable-mpi-cxx openmpi-nodes: 192.168.1.2 slots=2 192.168.1.3 slots=2 192.168.1.4 slots=2 ...etc etc... job launch cmd: mpirun --mca btl tcp,self --mca btl_tcp_if_exclude lo,eth0 -byslot -machinefile ~/openmpi-nodes -np... LAM 7.1.3: ./configure --prefix=/home/ciaccio/lam --disable-tv --disable-tv-queue FFLAGS=-O3 CFLAGS=-O3 --with-gnu-ld --with-rsh=ssh --without-mpi2cpp ~/lam/bin/lamboot ~/lam-nodes lam-nodes: 192.168.1.2 cpu=2 192.168.1.3 cpu=2 192.168.1.4 cpu=2 ...etc etc... job launch cmd: mpirun -np $1 n0,0,1,1,2,2... (dual cpu case) mpirun -np $1 n0,1,2... (single cpu case)