Ok. When you do the run are you tcpdumping the receiver side to see if there
are lost frames ?
> Finally, I found it rather curious that a stalled MPI code would
> sometimes resume running if we sent a single "ping" to all hosts.
Thats quite interesting, although it may be simply that it kicked in timewise
while you were doing the ping. That can be hard to measure quantitively.
Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/