PSM/ipath related errors¶
On elcid, openmpi by default uses the Performance Scaled Messaging API (PSM), Intel’s (formerly QLogic’s) low-level, user-level communication interface. that offers a transport abstraction layer which is very effective on infiniband-based nodes.
However, this layer provides a limited number of contexts (which depends on the hardware layer), “slots” that software application can use to communicate.
When running a single job, allocating all the available processors of a node (nodes=1:ppn=64), the jobs can benefit from the performance offered by this implementation.
When sharing a node, instead, running multiple multi-threaded jobs on the same node (when allocating only part of the processors of a node using -l node=1:ppn=[<64], for instance), there is a competition, and the first job can allocate more slots, exauting the contexts provided by PSM.
As a consequence, it’s likely that a second job fails in allocating this resource, and openmpi cannot start, reporting messages like the following:
- PSM was unable to open an endpoint. Netword down
- PSM can’t open /dev/ipath for reading and writing (err=23)
- PSM can’t open /dev/ipath, network down (err=26)
- PSM was unable to open an endpoint. Please make sure that the network link is active on the node and the hardware is functioning.
- Error: Failure in initializing endpoint
- 1 more process has sent help message help-mtl-psm.txt / unable to open endpoint
- Driver initialization failure on /dev/ipath (err=23)
- Error: Could not detect network connectivity
- ipath_userinit: assign_context command failed: Network is down
The solution is either to submit jobs that allocate the whole node (or multiple nodes):
#PBS -l nodes=1:ppn=64 (or node=2:ppn=64, node=3:ppn=64, ...)
or invoking mpirun with options that force it to ignore the psm and openib layer (which may be convenient ONLY when running on a single node asking for less than 64 processors):
mpirun --mca mtl ^psm --mca btl ^openib ... btl - MPI point-to-point Byte Transfer Layer, used for MPI point-to-point messages on some types of networks mtl - Matching transport layer, used for MPI point-to-point messages on some types of networks
1) command line¶
$ mpirun -np 3 --mca mtl ^psm --mca btl ^openib ./test1.x
$ export OMPI_MCA_btl=^openib $ export OMPI_MCA_mtl=^psm $ mpirun -np 3 ./test1.x
2.1) environment one liner¶
$ OMPI_MCA_btl=^openib OMPI_MCA_mtl=^psm mpirun -np 3 ./test1.x
3) permament configuration using the file $HOME/.openmpi/mca-params.conf¶
$ mkdir -v ~/.openmpi $ cat <<__EOF__>> $HOME/.openmpi/mca-params.conf btl=^openib mtl=^psm __EOF__
$ mkdir -v ~/.openmpi && printf 'btl=^openib\nmtl=^psm\n' >> $HOME/.openmpi/mca-params.conf
$ mpirun -np 3 ./test1.x
It is possible to verify whether psm/openib are in use by using the following commands:
$ module load openmpi $ ompi_info -a | egrep -i '(openib|psm)'
MCA btl: parameter "btl" (current value: <^openib>, data source: file [~/.openmpi/mca-params.conf]) MCA mtl: parameter "mtl" (current value: <^psm>, data source: file [~/.openmpi/mca-params.conf])