Re: [RFC/RFT PATCH v3] sched: automated per tty task groups

From: Samuel Thibault
Date: Fri Nov 19 2010 - 09:55:17 EST


Peter Zijlstra, le Fri 19 Nov 2010 15:43:13 +0100, a écrit :
> > MPI jobs typically communicate with each other. Keeping them on the same
> > socket permits to keep shared-memory MPI drivers to mostly remain in
> > e.g. the L3 cache. That typically gives benefits.
>
> Pushing them away permits them to use a larger part of that same L3
> cache allowing them to work on larger data sets.

But then you are not benefitting from all CPU cores.

> Most of the MPI apps
> have a large compute to communication ratio because that is what allows
> them to run in parallel so well (traditionally the interconnects were
> terribly slow to boot), that suggests that working on larger data sets
> is a good thing and running on the same node really doesn't matter since
> communication is assumes slow anyway.

Err, if the compute to communication ratio is big, then you should use
all CPU cores, up to the point where communication becomes a matter
again, and making sure that related MPI processes end up on the same
socket will permit to got a it further.

> There really is no simple solution to his.

I never said there was even a solution, actually (in particular any kind
of generic solution), but that there are a a few simple ways exist to
make things better.

Samuel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/