Re: [RESEND] [PATCH v2] [BUGFIX] x86/x86_64: fix CPU offlining triggered "inactive" device IRQ interrruption

From: Eric W. Biederman
Date: Mon Jun 08 2009 - 15:19:45 EST


Ingo Molnar <mingo@xxxxxxx> writes:

> * Gary Hade <garyhade@xxxxxxxxxx> wrote:
>
>> On Wed, Jun 03, 2009 at 04:55:26AM -0700, Eric W. Biederman wrote:
>> > Gary Hade <garyhade@xxxxxxxxxx> writes:
>> >
>> > > Impact: Eliminates a race that can leave the system in an
>> > > unusable state
>> > >
>> > > During rapid offlining of multiple CPUs there is a chance
>> > > that an IRQ affinity move destination CPU will be offlined
>> > > before the IRQ affinity move initiated during the offlining
>> > > of a previous CPU completes. This can happen when the device
>> > > is not very active and thus fails to generate the IRQ that is
>> > > needed to complete the IRQ affinity move before the move
>> > > destination CPU is offlined. When this happens there is an
>> > > -EBUSY return from __assign_irq_vector() during the offlining
>> > > of the IRQ move destination CPU which prevents initiation of
>> > > a new IRQ affinity move operation to an online CPU. This
>> > > leaves the IRQ affinity set to an offlined CPU.
>> > >
>> > > I have been able to reproduce the problem on some of our
>> > > systems using the following script. When the system is idle
>> > > the problem often reproduces during the first CPU offlining
>> > > sequence.
>> >
>> > Nacked-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
>> >
>> > fixup_irqs() is broken for allowing such a thing.
>>
>> When fixup_irqs() calls the set_affinity function:
>> ...
>> if (desc->chip->set_affinity)
>> desc->chip->set_affinity(irq, affinity);
>> ...
>> it receives no feedback so it obviously expects the set_affinity
>> function or it's called functions to do the right thing by preventing
>> or correctly handling any problems that should arise. In the case of
>> this bug there is obviously a problem happening during the set_affinity
>> function call that needs to be resolved and/or properly handled.
>>
>> When you made your "x86_64 irq: Safely cleanup an irq after moving it."
>> changes (re: http://lkml.org/lkml/2007/2/23/92) you added the check
>> to __assign_irq_vector() that causes it to return -EBUSY if the
>> migration of the IRQ is still in progress:
>> + if ((cfg->move_in_progress) || cfg->move_cleanup_count)
>> + return -EBUSY;
>> +
>> However, you did not add any code to other functions on the
>> call stack to properly deal with this error. When doing this
>> you may have assumed (as I may have also assumed) that the underlying
>> code was solid enough that the handling was not needed. Unfortunately,
>> you apparently did not anticipate the case where an idle or relatively
>> idle device may not generate the IRQ needed to complete the move
>> before the CPU that is still handling that IRQ is offlined.
>>
>> My fix only addresses the issue that caused the -EBUSY return
>> and subsequent mess. It does not address the omitted handling
>> for this error condition. If we were to add the handling to
>> fixup_irq() and the arch and non-arch specific functions above
>> it on the call stack as you may be suggesting, it would be quite
>> involved because of all the things that would need to be undone.
>>
>> I am not certain that my fix plugs the very last hole that could
>> cause the -EBUSY return from __assign_irq_vector() so maybe we
>> should at least add a warning or BUG_ON to make the unhandled
>> error more obvious in the future. I would be happy to provide
>> this via a separate patch.
>
> A WARN_ON_ONCE() patch would certainly be nice, as a reminder and as
> a prodder-tool.

In fixup_irqs such a warning would be reasonable. In assign_irq_vector
it makes no sense.

I just read through the code. Anything that assumes assign_irq_vector
will always succeed is BROKEN. We can not guarantee it. There are
also memory allocation failures and the fundamental problem that we
may have more irqs than can fit on a single cpu.

Furthermore while we require at least two irqs to complete a irq migration
I don't believe we can avoid returning -EBUSY there.

I really really hate these patches that come out of assuming that fixup_irqs
is or ever was working and reasonable.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/