Re: TOE brain dump

From: Eric W. Biederman (ebiederm@xmission.com)
Date: Wed Aug 06 2003 - 02:58:56 EST


Werner Almesberger <werner@almesberger.net> writes:

> Eric W. Biederman wrote:
> > MPI is not a transport. It an interface like the Berkeley sockets
> > layer.
>
> Hmm, but doesn't it also unify transport semantics (i.e. chop
> TCP streams into messages), maybe add reliability to transports
> that don't have it, and provide addressing ? Okay, perhaps you
> wouldn't call this a transport in the OSI sense, but it still
> seems to have considerably more functionality than just
> providing an API.

Those are all features of the MPI implementation. It is
not that MPI does not have an underlying transport. MPI has
a lot of underlying transports. And there is a different MPI
implementation for each transport. Although a lot of them start
with a common base.
 
> > Mostly I think the that is less true, at least if they can stand the
> > process of severe code review and cleaning up their code.
>
> Hmm, people putting dozens of millions into building clusters
> can't afford to have what is probably their most essential
> infrastructure code reviewed and cleaned up ? Oh dear.

Afford, they can do. A lot of the users are researchers and
a lot of people doing the code are researchers. So corralling
them up and getting production quality code can be a challenge,
or getting them to take small enough steps that they don't
frighten the rest of the world.

Plus ten million dollars pretty much buys you a spot in the top 10 of
the top 500 supercomputers. The bulk of the clusters are a lot less
expensive than that.
 
> > But of course to get through the peer review process people need
> > to understand what they are doing.
>
> A good point :-)
>
> > So store and forward of packets in a 3 layer switch hierarchy, at 1.3 us
> > per copy.
>
> But your switch could just do cut-through, no ? Or do they
> need to recompute checksums ?

Correct, switches can and generally do implement cut-through in that
kind of environment. I was just showing that even at 10Gbps treating
a packet as an atomic unit has issues. cut-through is necessary
to keep your latency down. Do any ethernet switches do cut-through?
 
> > A lot of the NICs which are used for MPI tend to be smart for two
> > reasons. 1) So they can do source routing. 2) So they can safely
> > export some of their interface to user space, so in the fast path
> > they can bypass the kernel.
>
> The second part could be interesting for TOE, too. Only that
> the interface exported would just be the socket interface.

Agreed.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Aug 07 2003 - 22:00:32 EST