Re: TOE brain dump

From: Matti Aarnio (matti.aarnio@zmailer.org)
Date: Wed Aug 06 2003 - 12:55:57 EST

Next message: Jose Luis Domingo Lopez: "Re: linux-2.6-test2, ipsec in tunneling mode"
Previous message: Oleg Drokin: "Re: [PATCH] [2.6] reiserfs: fix locking in reiserfs_remount"
In reply to: Andy Isaacson: "Re: TOE brain dump"
Next in thread: Lincoln Dale: "Re: TOE brain dump"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Aug 06, 2003 at 12:01:45PM -0500, Andy Isaacson wrote:
> On Wed, Aug 06, 2003 at 12:27:17PM -0400, Chris Friesen wrote:
> > Andy Isaacson wrote:
> > > On Wed, Aug 06, 2003 at 10:37:58AM -0300, Werner Almesberger wrote:
> > >>Eric W. Biederman wrote:
> > >>>to keep your latency down. Do any ethernet switches do cut-through?
> > >>According to Google, many at least claim to do this.

Quite a while back (several years) several "cut-through" routing
things were introduced, primarily over ATMish core networks.

The idea ran essentially as: "if you can't find header address
lookup from cache, run routing and form a VC to carry rest of
the flow, if you can find a VC from cache, send the packet there"
(what the "VC" is in the end is not that important.)

NOTHING in those implementations was (as I recall) specifying about
treatment of the packet before it was fully collected into router
local buffer memory.

In very high speed local networks (like Cray T3 series switch fabric
with _routable_ packets) one can implement protocols, which carry
destination node address selector bits in header, and if the fabric
is e.g. congestion free one, there is guaranteed success at delivering
the bits to desired destination. To make UDPish communication a bit
simpler, relevant hardware got signal back about "sent ok thru /
collision", so the sender hardware could automagically retry the xmit.

To certain extent one could handle e.g. ethernet in similar style
by fast-switching packets by cached destination MAC addresses.
When destination MAC lookup points to some destination port in local
hardware, internal VC is formed (reserved in output end, presuming
sufficient core bandwidth to handle everything), and incoming enet
frame is sent piece by piece thru the internal switch to the output
port. If the output port can not be contacted immediately, full frame
(possibly two or three) need to be buffered at the receiver.

That way switch internal buffering delay would be -- lets see:
- preamble 7 bytes
- SFD 1 byte
- dest mac 6 bytes
plus processing delay, but that is absolute minimum for 100BASE-T

Cheap cluster super-computer makers are using ethernets, and other
"off the shelf" stuff, but I don't see why semi-proprietary high
performance "LANs" could not emerge for this market.
E.g. I would love to have cheapish (mere 5 times price of Cu-GE card)
"LAN" cards for cluster binding, especially if I get direct memory
access to other machine's memory.

A whole bundle of various cluster interconnects are mentioned
at this white-paper from 2001:

http://www.dell.com/us/en/slg/topics/power_ps4q01-ctcinter.htm

VIA, VI-IP, SCI, FE, infiniband, etc...

/Matti Aarnio
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Jose Luis Domingo Lopez: "Re: linux-2.6-test2, ipsec in tunneling mode"
Previous message: Oleg Drokin: "Re: [PATCH] [2.6] reiserfs: fix locking in reiserfs_remount"
In reply to: Andy Isaacson: "Re: TOE brain dump"
Next in thread: Lincoln Dale: "Re: TOE brain dump"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Thu Aug 07 2003 - 22:00:34 EST