Re: [PATCH] powerpc: xive: ensure active irqd when setting affinity

From: Michael Ellerman
Date: Wed Aug 09 2017 - 02:15:43 EST


Sukadev Bhattiprolu <sukadev@xxxxxxxxxxxxxxxxxx> writes:
> Michael Ellerman [mpe@xxxxxxxxxxxxxx] wrote:
>> Sukadev Bhattiprolu <sukadev@xxxxxxxxxxxxxxxxxx> writes:
>> > From fd0abf5c61b6041fdb75296e8580b86dc91d08d6 Mon Sep 17 00:00:00 2001
>> > From: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>
>> > Date: Tue, 1 Aug 2017 20:54:41 -0500
>> > Subject: [PATCH] powerpc: xive: ensure active irqd when setting affinity
>> >
>> > Ensure irqd is active before attempting to set affinity. This should
>> > make the set affinity code more robust. For instance, this prevents
>> > these messages seen on a 4.12 based kernel when taking cpus offline:
>> >
>> > [ 123.053037264,3] XIVE[ IC 00 ] ISN 2 lead to invalid IVE !
>> > [ 77.885859] xive: Error -6 reconfiguring irq 17
>> > [ 77.885862] IRQ17: set affinity failed(-6).
>> >
>> > The underlying problem with taking cpus offline was fixed in 4.13-rc1 by:
>> >
>> > commit 91f26cb4cd3c ("genirq/cpuhotplug: Do not migrated shutdown irqs")
>>
>> So do we still need this? Or is the above only a partial fix?
>
> It would be good to have this fix.
>
> Commit 91f26cb4cd3c fixes the problem, so we wont see the errors with
> that commit applied. But if such a problem were to show up again, xive
> will handle them earlier before hitting those errors.

I'm not sure I'm convinced. We can't handle every possible case of the
higher level code calling us in situations we don't expect.

For example irq_data could be NULL, but we trust the higher level code
not to do that to us.

Also I don't see any other driver doing this check.

$ git grep irqd_is_started
include/linux/irq.h:static inline bool irqd_is_started(struct irq_data *d)
kernel/irq/chip.c: if (irqd_is_started(d)) {
kernel/irq/chip.c: if (irqd_is_started(&desc->irq_data)) {
kernel/irq/cpuhotplug.c: if (irqd_is_per_cpu(d) || !irqd_is_started(d) || !irq_needs_fixup(d)) {


cheers