Re: better leve triggered IRQ management needed

From: Linus Torvalds
Date: Mon Apr 24 2006 - 17:07:10 EST




On Mon, 24 Apr 2006, Arjan van de Ven wrote:
>
> well... the corner case (as rmk described) is full starvation; even
> polling once per second is better than not polling at tall there.,..

I can confirm that even just polling once a second - or even less often -
is a huge improvement.

A long time ago, I had a machine with a 3c509 card that would sometimes
miss receive interrupts for some reason (it may actually have been a bug
in the disable_irq/enable_irq thing on SMP, I forget - this is at least
five years ago).

The 3c509 driver had some five-second timeout thing, which meant that it
would end up polling itself every five seconds regardless, and it made the
difference between a machine that was totally undebuggable over a network,
and one that actually worked surprisingly well (ie you could ssh into it,
and things would work, but have these long pauses every once in a while,
with burst of data).

Of course, the machine would have been totally useless for real work, but
it made it _much_ easier to see what was going on when things went south.

So "limping along" when things don't work can be a huge time-saver from a
debugging standpoint. So even if it's just that every registered SA_SHIRQ
would get a heartbeat at least once every five seconds (and we'd limit it
to SA_SHIRQ exactly because a driver that doesn't have that set may get
confused if it gets extra interrupts), that might sound totally useless,
but it might actually help somebody who otherwise might just make a pretty
useless "the machine hung" bug-report.

The fake interrupt could even print out a warning if somebody returns
SA_HANDLED (since normally there _shouldn't_ have been any work to handle
for it), and if that means that for somebody, things go from "the machine
hung" to "the machine got very slow, and printed out 'fake interrupt for
ide0 returned SA_HANDLED!'", that would potentially be a big debug aid.

We've had our ass saved quite a few times now by the irq storm detector
("irq X: nobody cared" and friends), which has helped debug irqs that
haven't been set up properly, that I'm convinced things like this might
well make a huge deal.

Of course, "things like this" does not necessarily cover the above
schenario. Maybe that is totally useless ;)

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/