Re: tulip driver in 2.1.11* - 2.1.21 is broken - new driver

David S. Miller (davem@dm.cobaltmicro.com)
Sun, 13 Sep 1998 20:55:55 -0700


Date: Sun, 13 Sep 1998 12:27:43 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>

David, did any of the 2.1.120 patches change the way how the
downcalls into the network drivers were done? I notice that some
irq disables were changed into "bh_atomic()" calls instead, and
that might certainly have changed timings quite a lot. And maybe
one of them really was interrupt- critial rather than bh-critical
(sounds unlikely, as almost everything happens in bh's, and almost
nothing happens in interrupts)?

Short summary: Some broken drivers need to be fixed, and the 2.1.120
changes only made the bugs more visible.

What I now think what was happening is that many drivers were getting
away with murder in their transmit routines, and the 2.1.120 patches
finally brought this to light.

Some drivers seem to assume they run with interrupts totally disabled
in their transmit routines. The only thing they were ever allowed to
really assume was that the transmit routine was never re-entered.
This is all we are supposed to guarentee at the generic device
transmit level when calling these functions.

As a side effect drivers used to (by luck) also be protected by their
own interrupt handlers running during the transmit routine, it looks
like the 3c59x is one of the guilty parties wrt. this.

Actually this is good, now these drivers will get fixed up.

This also explains why the "broken" 3c59x.c driver update could be
fixed by forcing it's interrupt to run on a single CPU with an IO-APIC
kludge. The bug is still there even with the driver reverted back,
just hidden, the updated driver only exposed the problem better due to
different timings.

Also this causes the "failures on non-Intel UP" cases to make sense as
well.

A quick scan shows:

1) eepro100.c works and always has because it does disable interrupts
when it pokes with the transmit ring and the hardware in it's
transmit routine

2) tulip.c appears to be ok, perhaps by luck, even though it doesn't
disable interrupts explicitly. Although I have seen cases where
adding interrupt disabling to the transmit routine made bugs in
this driver go away even in 2.0.x on the Cobalt machines...

3) de4x5.c does a lot of it's own locking and does cli/sti around the
critical tx ring mucking in it's transmit routine

Looks like a few drivers need to be fixed up about this.

Later,
David S. Miller
davem@dm.cobaltmicro.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/faq.html