The definitive recipe for running MPI jobs on the EGEE Grid on sites that support mpi-start can be found at http://egee-uig.web.cern.ch/egee-uig/production_pages/MPIJobs.html

To find sites that support mpi-start add this to your JDL requirements:

Member("MPI-START", other.GlueHostApplicationSoftwareRunTimeEnvironment)

and to verify that they are working correctly you can look at the SAM tests for the dteam VO (test CE-sft-mpi encapsulates all MPI tests).

Using sites without mpi-start

If you want to get the convenience of mpi-start usage even at sites which have not yet installed it, you can submit a tarball (e.g. mpi-start-0.0.58.tar.gz) of mpi-start along with your job (in the input sandbox) and add the following lines at the start of your wrapper script to set it up:

if [ "x$I2G_MPI_START" = "x" ]; then
    # untar mpi-start and set up variables
    tar xzf mpi-start-*.tar.gz
    export I2G_MPI_START=bin/mpi-start
    MPIRUN=`which mpirun`
    export MPI_MPICH_PATH=`dirname $MPIRUN`
fi

I suggest that you restrict yourself to MPICH in this case as it is the only flavour of MPI likely to be installed at such sites. It's also a good idea to make sure your jobs are targetted to sites with the pbs (or lsf) jobmanagers due to the problem described here. You can target your jobs by adding this to your requirements:

  other.GlueCEInfoLRMSType=="pbs";

mpi: JobSubmission (last edited 2011-07-12 14:41:39 by localhost)