Re: [patch] drivers/net/plip.c

Alexander Viro (viro@math.psu.edu)
Sun, 31 Jan 1999 05:16:29 -0500 (EST)


On Sun, 31 Jan 1999, Ely Wilson wrote:

> Ok, I bet that was fun putting together. Now if plip_send_packet disables
Well, if you consider drawing finite automates from the code as
fun ;-/
> the irq, and it does, and then exits with a return value of TIMEOUT before
> enabling irq we face the same problem as before your patch.
[snip the analysis]
> Unfortunately my machines don't agree with me. As much as I really really
> want this event to not happen, it must be, because I can't find any other
> explanation for the dead interface and endless transmit timeout messages
> that plague my machine with the current plip.c minus my patch.

OK, several questions: does it happen on both sides? Only on the
slow one? What happens if you are pinging the thing? I.e. does another
side see your packets? (ifconfig)
Another thing: could you include printk's (with KERN_DEBUG
priority) into ENABLE/DISABLE (guess why they became macros ;-) and
into the beginning/end of plip_send_packet/plip_receive_packet?
(convenient variant being: < and > for send, { and } for receive, E(line)
for enable, D(line) for disable). Try to beat the fscker into the hangup
and extract the stuff from the log. It should demonstrate the path where
the shit hits the fan.

> To explain my '1 in 4(or 5) chance' the chances would be that at any given
> point the state of irq could become unknown or 'out of sync' with the
> general operation of the driver.
>
> Something else I found while staring at the code last night was:
>
> plip_bh() passes ERROR as an error value to plip_bh_timeout_error(), in that
> function you will find this code:
>
> if (error != ERROR) { /* Timeout */
[snip]
> } else
> error = HS_TIMEOUT;
>
> which effectively turns your ERROR into HS_TIMEOUT, which in turn makes a
> call to disable_irq()

Erm... Look two lines above that piece. It happens only for
sending.

> plip_receive_packet() (called from plip_bh)
[snip]
> return ERROR;
[snip]

> This is unlikely to happen, and my problem lies within the sending operation
> (or so I think). But this would produce the same result. I hadn't noticed
See above. I agree that reassigning error was dirty hack, but
originally I had written (error == HS_TIMEOUT || error == ERROR &&
nl->connection==PLIP_CN_SEND), looked at the wrapped line and decided that
we could remap ERROR -> HS_TIMEOUT in if() several lines above and be done
with that. Probably it required a comment. Sorry.

> That's it. I just created macros that would prevent this event from
> happening in all cases. And you want to hear some bold ass truth? It makes
> little difference. My solution? I leave the irq alone by making the macros
> Functional-NOPs (; statements). You want to know why? I don't see any
> reason to disable the irq. afaik the PLIP protocol itself relies on each
> end negotiating using PAR_CONTROL_INTER (parport.h) being written it's
> control line. In the event the two ends de-sync the connection attempts to
> resume by sending an ack and continuing (or so I like to believe, hence,
> the secondary-purpose behind the protocol). If this is the case then
> disabling irq yields little or no benefit.

<shrug> I have a nasty gut feeling that you've got a race or two that way.
I can't back it right now, but... OK, I'll dig out my records and look
them through. BTW, there is another implementation of PLIP and we can't
say 'Linux<->Linux link works, screw everything else'. Actually there are
at least two other impementations: BSD one and original Russell's (for
DOS).

> I've been using this 'technique' for a few days now and have had no
> problems, I would like to have someone else's opinion on this though. What
> advantage would be gained? What problem should it be causing that I'm not
> seeing? if you want me to point out how this driver recovers when de-sync "i
> don't know, grep for 'collision'."

Races.

> As a final note on not managing the irq, my xfer is about equal to
> 40kbyte/s, the same as under 2.1.1xx
>
> I look forward to responses, I'm a kernel hacker wanna-be and any
> information sent my way is appreciated, even if you do get verbal, at least
> I'll know. I'm still struggling (between my job and other responsibilities)
> to understand much of how the kernel works. So most of what I'd thought is
> wrong is based on assumption. Things you could help me understand so I can
> shut my trap next time I think something works:
>
> The state of the irq when bh_plip() is called? I assumed irq is enabled.
> What happens if you disable irq from within a bh function? I assumed the
> state is left unchanged, so if you disable irq it remains disabled until re
> enabled. And will it be re-enabled before the next call to a bh function? I
> assumed no since it didn't appear to be working that way.

Ahem... Look at it that way: system call is an additional
processor instruction for userland. Signals correspond to
software-generated interrupts. Real hardware interrupts are mapped to
something similar. We have two-stage mechanism - IRQ triggers the handler
that should do absolute minimum (ACK the interrupt, do IO that should
be done *immediately*) and schedule the appropriate bottom-halves. Bottom
halves are executed on the return from system call, on the rescheduling,
amd on return from exception. It is also done upon the return from IRQ
handling. Bottom halves are executed with interrupts enabled, but they are
serialized wrt each other (grep for do_bottom_half and you'll see).
If you disable IRQ from bh it will stay disabled. Next call of bh
may never happen if the only beast that schedules it is the first-stage
IRQ handler sitting on the (disabled) IRQ.
In case of PLIP there are two bottom halves. One (plip_bh) is
scheduled by interrupt (plip_interrupt) and by the other bh
(plip_kick_bh). plip_kick_bh is scheduled by timer. Typical scenario:
<interrupt>
plip_bh either does everything or does part of work and schedules
plip_kick_bh. In the later case after a while (1/100s) plip_kick_bh
happens and schedules plib_bh, yodda, yodda. That's a receiving side.
Sending side is activated by ethernet layer and then goes the same way.

plip.c emulates a card with 5 subsystems:
1) byte-sender
2) byte-receiver
3) packets-sender
4) packet-receiver
5) top-level control
Each part has its own state. Think of them as of separate ICs. Top-level
part decides what parts will work on the next step and resets them if
needed. It also listens to requests from the kernel (send) and interrupts
from the line (receive).

I don't like the idea of leaving the IRQ enabled by several
reasons. First of all, it has a chance to confuse the hell of top-level
part. Another thing being:
<rant intensity=mild>
Russell did a messy work when he designed the protocol. We have 5
incoming lines. One of them triggers interrupt. When we read from the port
we get those lines in bits 7,5,4 and 3. Interrupt line is in bit 6.
Russell <substitute your favourite expletive> decided to use bits 6,5,4
and 3 for data and bit 7 for sync. Sheesh... Yes, sir, shuffling bits
would be terrible. Except that we can create a table and use it instead of
bitops. Yes, whole horrible 512 bytes. Sync-on-intr would be much nicer,
but no such luck. And we can't change the protocol - compatibility hits
and all such. We could implement the second protocol and that would be the
Right Thing, but that's another story.
</rant>

Please, try to put printk's into the code and look at the log. It
shouldn't hang. Since it does I'ld like to know *why* it happens. We can
try to woodoo it out, but it will mean that we are in for the next portion
of shit several releases later ;-/ I'ld *really* like to look at the log.
I can't reproduce the problem here, so your assistance is needed.
Cheers,
Al

-- 
There are no "civil aviation for dummies" books out there and most of
you would probably be scared and spend a lot of your time looking up
if there was one. :-)			  Jordan Hubbard in c.u.b.f.m

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/