RE: [PATCH 2.6.21-rc5] Flush MSI-X table writes (rev 3)

From: Williams, Mitch A
Date: Fri Mar 30 2007 - 16:49:58 EST


Eric W. Biederman wrote:

>The bug report would be phrased as someone seeing "No irq for vector"
>on x86_64. Unless they are a skilled developer they are unlikely to
>trace it down to not flushing posted writes to a MSI bar during irq
>migration. It part it is a subtle hardware/software race.
>
>I have seen some unresolved bug reports described this way on machines
>that have MSI-X capable hardware. The are against fedora core and I
>can't haven't been able to get the reporters to try 2.6.21-rcX and
>the other irq back ports don't work so well.
>
>I currently count 7 drivers in 2.6.21 that are susceptible to the race
>this bug closes.
>
>This version of the patch seems to meet all of the criteria for being
>obviously correct, and all it does is insert a stupid readl. Not
>very dangerous.
>
>This fix removes a race between hardware and software that you likely
>need to migrate MSI-X irqs a lot to trigger. Which can be the kind of
>thing that bytes you after you have a month of uptime. This is a very
>nasty condition to track down from bug reports. I'd rather not have a
>known culprit out there.

Agreed, this is a subtle bug, and was a real hairball to track down.
Even so, I'm surprised that nobody else has dug into this, since it
should affect anybody running MSI-X. I originally thought I was seeing
a hardware bug, which is why I dug more deeply into the issue.

If Eric is seeing bug reports related to "no vector for IRQ" in the
wild, then I have to change my stance and agree that this should be
pushed to -stable. Every one of those messages indicates that we
hit the race condition.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/