Re: Tx TCP rates down > 20% - A report.

Paul Gortmaker (gpg109@rsphy6.anu.edu.au)
Wed, 8 May 1996 18:00:19 +1000 (EST)


>From "Avery Pennarun" at May 4, 96 12:23:05 pm

> My ARCnet driver does what might be the "streamlining" Linus refers to. It
> follows logic more like this (which has been drastically simplified actually
> as one ARCnet packet from the TCP layer might be broken into 3 or more
> ARCnet packets when using RFC1201 encapsulation):
>
> - TCP layer sends one to ARCnet driver
> - driver marks tbusy=1
> - choose a TX buffer, copy packet into it
> - begin transmit
> - set internal buffer_busy flag=1
> - set tbusy=0

If you go back and re-read what Linus wrote again, you will see that you
don't need to set tbusy upon entering your Tx function. As Linus said, your
Tx function is guaranteed to be single threaded without you frobbing tbusy.
What you need to do is set tbusy when falling out the bottom *only* if
your card doesn't have room for another (full sized) packet.

> This means that while the packet is being sent, the kernel can actually be
> loading another packet into the ARCnet buffer. The original skeleton.c, at
> least at the time, did not suggest doing this so the ARCnet driver didn't
> either until rather late in its lifetime (just before Linux 1.2, I think).
>
> As an example of the difference this makes, before I made the change I was
> only getting around 120k/sec maximum, while now I regularly get around
> 190k/sec. This is about a 37% improvement. I expect it will be most
> pronounced in slow cards where copying the packet into the buffer takes a
> very long time, such as my 8-bit ARCnet cards. 8-bit NE2000's for example,
> might have similar symptoms which have been blamed on mere slow cards. (And
> of course that would be a large portion of the problem :))

No, even 8 bit ne1000 cards do a 3kB/5kB split, and thus can be handed two
full size packets per net_bh(). This was Linus' point -- things are in
general more efficient if you can hand off two packets per net_bh()
instead of only one.

> One thing I would like to point out is that fiddling with the tbusy flag
> threw the driver into fits of instability for several months. You have to
> be very, very careful when the tbusy flag is set and when it isn't. I think
> the "serialization" Linus refers to was brought in around 1.2.8 or 1.2.9,
> which is when the ARCnet driver magically stabilized itself.

Yes, you *used* to have to be very careful with tbusy up until 1.2.9 as
it *used* to be responsible for keeping the Tx function single threaded.

> To be realistic the ARCnet driver is considerably more complicated than most
> network drivers partially because of RFC1201-compliant driver-level
> fragmentation, but that was easy until I started fiddling with tbusy. I was
> overjoyed when someone else fixed things in general around 1.2.9. Debugging

That someone else was me, while finally fixing the solid lockups that
the ne2000 cards used to suffer. Note that this fix and the removal
of the responsibility on tbusy for serialization are one and the same.
The solution implemented now is a bit different than what I originally
used; I had used a per-device locking scheme, while at present it uses
start/end_bh_atomic() around the actual driver Tx function in dev.c.

> So to summarize, setting tbusy the "right" is a good idea performance wise,
> but you _will_ screw things up (unless you are a better programmer than me,
> and there should be several of those on this list). I would not suggest
> going through all the drivers and just moving tbusy settings around without
> testing _very_ thoroughly.

Don't panic. It isn't that scary, and it won't be done until 2.1.x anyway.
It is more of an editing job than anything else. The code in each driver
that is responsible for handling a Tx timeout needs to be separated from
the Tx function, and assigned to a dev->tx_timeout(dev) function. Then
the code in dev.c can handle timeouts in a sensible fashion. As I said,
a major editing job.

Paul.