Re: Serializing hard_start_xmit()

Paul Gortmaker (paul@rasty.anu.edu.au)
Thu, 30 Jan 1997 20:46:59 +1000 (EST)


>
> > (3) Guarantee that hard_start_xmit() is called atomically with dev->tbusy==0.
> > (This can make network drivers simple.)
> Yes.
>
> I'll apply this when I've been over it. I can see the smp stuff is wrong
> at the moment (lock_kernel is handled entirely at syscall level still)

Yes, this is good, but you have to be careful on how you do it.

Most of the drivers expect hard_start_xmit() to be called with
tbusy !=0 upon some sort of error condition (Tx timeout). If you
simply block calling them when tbusy !=0, you will end up with
a system that will stall on a failed transmit, with no way to
un-wedge itself.

I had this all reasonably well sorted for early 1.3 but the
patches got lost in the noise of Alan's major skb changes at
that time. I'll quickly detail what I had done so that others
with a bit more free time at the moment (Niibe?) can think about
it. If it isn't fixed up in a month or so, then at that point I'll
probably have enough free time to do it all over again, but at
the moment I am snowed under with non-linux stuff.

The 1st problem is that each driver checks tbusy !=0 and elapsed
jiffies as a sort of error flag wrt. the previous xmit. This is bad
in terms of code duplication and xmit latency. Also, it breaks in
light of the above change. The 1st step is to recognize the Tx error
handling code in each driver and move it to a separate function
(say dev->tx_failed) so that the upper layers can get at it via
the dev struct. In the process of doing so, toss any tbusy or jiffy
tests in each driver's Tx code in the bin. The upper code is now
responsible for *not* calling hard_start_xmit() when tbusy is
set, and in addition, it should watch the jiffies and call
dev->tx_failed when say dev->tx_timeout has elapsed. The
dev->tx_timeout is set by the driver; small for 10Mbps, large for
SLIP and the like. This solves the code duplication, reduces the
latency a bit, and means that the system won't stall on a failed xmit.

The main problem I encountered was that there is now a *lot* of
drivers, and the boundary between the Tx error code and the normal
Tx code is not always 100% clear in each driver. This makes it quite
difficult to separate the Tx-error code from the basic Tx code if you
are not familiar with each driver. Hence it is not a trivial job to
hack each driver in this respect.

Oh well. I hope this is of some use to someone in the interim.
Meanwhile, it is back to "real" work for me. <sigh>

Paul.