Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannothandle IRQ -1"

From: Linus Torvalds
Date: Fri Oct 06 2006 - 14:13:00 EST




On Fri, 6 Oct 2006, Eric W. Biederman wrote:
>
> Forcing irqs to specific cpus is not something this patch adds. That
> is the way the ioapic routes irqs.

What that patch adds is to make it an ERROR if some irq goes to an
unexpected cpu.

And that very much is wrong.

> Yes. A single problem over several months of testing has been found.

Umm. It got found the moment it became part of the standard tree.

The fact is, "months of testing" is not actually very much, if it's the
-mm tree. That's at best a "good vetting", but it really doesn't prove
anything.

> So this is fairly fundamentally an irq migration problem. If you
> never change which cpu an irq is pointed at you don't have problems,
> as there are no races.

So? Does that change the issue that this new model seems inherently racy?

> The current irq migration logic does everything in the irq handler
> after an irq has been received so we can avoid various kinds of races.

No. You don't understand, or you refuse to face the issue.

The races are in _hardware_, outside the CPU. The fact that we do things
in an irq handler doesn't seem to change a lot.

And what do you intend to do if it turns out that the reason it doesn't
work on x366 is that the _hardware_ just is incompatible with your model?

I'm not saying that's the case, and maybe there's some stupid bug that has
been overlooked, and maybe it can all work fine. But the new model _does_
seem to be at least _potentially_ fundamentally broken.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/