Re: Alan Shih: "TCP IP Offloading Interface"

From: Alan Cox (alan@lxorguk.ukuu.org.uk)
Date: Mon Jul 14 2003 - 15:34:08 EST


On Llu, 2003-07-14 at 21:19, David griego wrote:
> Embedded does not simply include toasters and fridges, it also includes NAS
> and SAN appliances as well as telco gear. These types of devices have

I have some sympathy with SAN developers, because you can treat the link
as dedicated and just for the SAN so you can implement iSCSI optimised
code. The right engineering approach probably consists of removing iSCSI
and doing the job right but I appreciate there are non engineering
issues.

> advanced memory subsystems and run processors such as PPC and ARM. One of
> the most limiting factors in these types of devices is power consumption.

Memory bandwidth is your killer quite often, you have to keep the CPU
away from the data and often even off the memory bus holding most of the
data.

> >deeply embedded stuff also doesn't run with MMUs or runs 'partially
> >trusted' most of the VM games and the socket api games also go away.
>
> See PPC and ARM architecture for the use of MMUs in embedded systems

You still have the partial trust stuff and the ability to violate the
socket api in the interest of speed - that helps.

> >TCP/IP is an exercise in two things when you are running at speed
> >
> >1. Finding the memory bandwidth - ToE doesn't help, checksums do,
> > sg does, on card target buffers do with decent chipsets.
>
> A TOE enabled with RDDP would help eliminate the kernel to user space copy
> (and in the case of SAMBA the copy back to the kernel). This would reduce
> the memory system loading by a third to a half.

Thats mostly an API issue. Socket API found non optimal after 20 years.
Hardly shocking news - the cluster people already changed API, Larry
McVoy proposed stuff like splice years ago because he saw it coming.

> >2. Handling in order perfectly predicted data streams. ToE is
> > overkill for this. Thats about latency to memory and touching
> > as little as possible. The main CPU has a rather good connection
> > to main memory.
> >
> Yes, RDDP would be nice to have though for the reason stated for #1, so the
> hardware would need to at least be TCP aware.

TCP aware hardware is good, segmentation, prediction, buffer/sequence
matchers etc but you don't want the policy parts in silicon

> >repeatedly been screwed by attackers hitting software or other slow
> >paths in the device to attack it.
> The use of ASICs could ensure that TCP processing is as quick as wire speed

Only if your ASIC worst case is wire speed. If your ASIC has one path it
must drop to software for and has a low powered internal CPU to fix up,
you just lost and you'll plow like a cisco with too many filter rules to
do in silicon.

> Try to keep the datapath processing on the TOE, and everything else in the
> OS. Also give the API the ability to turn of the TOE if a hole exists and
> use it like a regular NIC.

Some of this is certainly about how you partition the load - its no
different to things like RAID. We've had hardware raid, software raid,
and finally people are popping up with neat hybrids. We've had software
audio, hardware audio and again now we are seeing hybrids that do
different bits in hardware and software - ditto modems.

> >The internet land speed record is held by a non ToE system, let me know
> >when that changes.
> >
> Layer one network processing is often handled by ASICS, also some of the
> fastest encryption engines are hardware. I suggest we don't wait until your
> proven wrong before making a decision on TOE.

You don't have to. You can go build and test and maintain a set of TOE patches.
Nobody is stopping you. Lots of Linux code exists because someone decided that
the official story was wrong and proved it so.

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Jul 15 2003 - 22:00:54 EST