Re: 2.1.125 - Problem. & Aic7xxx 5.1.0

Doug Ledford (dledford@redhat.com)
Mon, 09 Nov 1998 21:18:21 -0500

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Roman Drahtmueller: "2.0.36-pre17"
Previous message: cw@ix.net.nz: "Re: Schedule idle"

Gerard Roudier wrote:

First off, I didn't loose or forget about this email. I let it set for a
month so I could respond without writing a scathing reply that would earn me
rebukes for a week to come.

> Hello Doug,
>
> I just downloaded 2.1.125 and looked into our pre-2.2 kernel.
> No need to tell I think it is perhaps broken in some driver place. ;-)
>
> If I have been able to understand correctly how the thing work:
>
> C code isr:
> -----------
> C0) loop until no interrupt condition:
> C1) if command complete
> C2) clear the commande complete flag from the isr
> C3) proceed completed commands
> etc ...
>
> Sequencer:
> ----------
> S1) push the complete SCB to the complete queue
> S2) raise the command complete interrupt flag
>
> As you driver is supporting MMIO, C2 may be posted as a PCI write
> transaction. If S1 is also a PCI memory write from the sequencer, it also
> may be posted (I am not sure of that, but I am sure you will confirm or
> infirm).

It is specifically these types of things that the presence of the mb() macro
at the end of every aic_outb() and aic_inb() call is intended to stop. But,
the mb() goes further and also solves things like weak ordering problems on
K6 and the latest PII processors that might re-order certain instructions
without some sort of lock. If the mb() macro doesn't do so, then someone
please feel free to speak up, but that was my understanding when I was
working with Linus to get the right code in there so these wouldn't be
problems.

> If S1 occurs after C0 and before C2 and both S1 and C2 are posted, then
> your driver can miss the latest SCB completed by S1. If there is no other
> pending commands, the driver may not recieve any further interrupt for
> the latest completed SCB.
>
> I also noticed that you are using the following contruct for handling
> interrupts:
>
> do
> {
> aic7xxx_isr(irq, dev_id, regs);
> } while ( (aic_inb(p, INTSTAT) & INT_PEND) );
>
> I often noticed this in BSD stuff. If the purpose is not to lose PCI
> interrupt, let me tell you that such a construct is, in my opinion, kind
> of bad quality band-aid.

This is the comment that made me let this letter set for a month. Gerard,
if you seriously think I am so lame as to use such a construct as a
"band-aid" to avoid loosing PCI interrupts, then we have little if anything
else further to discuss. You and I are obviously in two different worlds.
In any case, for the sake of others on the kernel list, let me elaborate on
the code in question.

I originally wrote that code without the do { } while loop. It was a
single shot run through the interrupt handler and then return. At the time,
the FreeBSD driver *did* have a while loop for its interrupt handler, I
simply chose not to follow it. It worked fine on PCI cards, but it would
sometimes hang on VLB cards. Then I found why, it was hanging around the two
sections of code where we enable REQINIT handling. The reason, when we
enabled REQINITs then cleared the SEQINT, we missed the edge the first
REQINIT would have caused, so VLB cards with their edge triggered interrupts
would miss the interrupt. I corrected the problem, made note of it in my
comments, then informed Justin Gibbs of it. He corrected it in his driver
as well and removed his while loop. I later added the loop back as an
optimization. The exact code area where I made comments about loosing the
interrupt (search for AWAITING_MSG in handle_seqint()) is a prime example.
By the time we have handled the seqint, we've already raised the REQINIT
flag and are ready for another interrupt. I could simply return and let the
interrupt that's pending in the core interrupt code re-call my interrupt
handler, or I could take care of as many REQINIT interrupts as possible
before returning. I don't know about the NCR hardware, but the Adaptec
hardware *is* fast enough that we can end up with immediate re-enter
interrupt situations.

> On the other hand, you seem to import lots of BSD
> material into Linux kernel. I did so years ago. I think that Linux has
> most subtlenesses in various places than BSD stuff, even if it may look
> sometimes a bit more broken. If would suggest you to be a bit more
> selective in BSD stuff importation into the Linux kernel.

Some times your english is broken enough that's it's hard to understand
exactly what you mean. This is one of those cases. However, two things are
clear here. One, that you think I import too much FreeBSD stuff, and two, a
subtle indication of what you think about my experience/abilities
(specifically, your tone is patronizing). I import the code that deals
directly with the sequencer and the sequencer itself. I write the rest of
that code myself. I may look at FreeBSD, but anyone who thinks I copy
FreeBSD can look at things where the code isn't dictated by hardware
ordering requirements and see I don't. Simply check the init code, the
reset code, the queue code, etc. In all of those areas it becomes obvious
that there actually is little code shared between FreeBSD and linux. Now,
having said that, I'm going to ignore the rest of your comment as uneducated
guessing.

--

Doug Ledford <dledford@redhat.com> Opinions expressed are my own, but they should be everybody's.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/

Next message: Roman Drahtmueller: "2.0.36-pre17"
Previous message: cw@ix.net.nz: "Re: Schedule idle"