Re: [PATCH v6 2/3] CPU hotplug, stop-machine: Plug race-window that leads to "IPI-to-offline-CPU"
From: Srivatsa S. Bhat
Date: Fri May 23 2014 - 13:06:40 EST
On 05/23/2014 09:23 PM, Srivatsa S. Bhat wrote:
> On 05/23/2014 09:18 PM, Peter Zijlstra wrote:
>> On Fri, May 23, 2014 at 09:07:18PM +0530, Srivatsa S. Bhat wrote:
>>> On 05/23/2014 09:03 PM, Srivatsa S. Bhat wrote:
>>>> On 05/23/2014 09:01 PM, Peter Zijlstra wrote:
>>>>> On Fri, May 23, 2014 at 08:48:07PM +0530, Srivatsa S. Bhat wrote:
>>>>>> On 05/23/2014 08:42 PM, Peter Zijlstra wrote:
>>>>>>> On Fri, May 23, 2014 at 08:15:35PM +0530, Srivatsa S. Bhat wrote:
>>>>>>>>>> + * During CPU offline, we don't want the other CPUs to send
>>>>>>>>>> + * IPIs to the active_cpu (the outgoing CPU) *after* it has
>>>>>>>>>> + * disabled interrupts (because, then it will notice the IPIs
>>>>>>>>>> + * only after it has gone offline). We can prevent this by
>>>>>>>>>> + * making the other CPUs disable their interrupts first - that
>>>>>>>>>> + * way, they will run the stop-machine code with interrupts
>>>>>>>>>> + * disabled, and hence won't send IPIs after that point.
>>>>>>>
>>>>>>> That's complete nonsense, you can send IPIs all you want with interrupts
>>>>>>> disabled.
>>>>>>>
>>>>>>
>>>>>> True, but that's not what the comment says. It says "you can't send IPIs
>>>>>> because you are running the *stop-machine* loop, because the stop-machine loop
>>>>>> doesn't send IPIs itself! The only possibility of sending IPIs from within
>>>>>> stop-machine is if that CPU can takes an interrupt and the *interrupt handler*
>>>>>> sends the IPI (like what the block layer used to do) - and we precisely avoid
>>>>>> that possibility by disabling interrupts. So no IPIs will be sent beyond
>>>>>> this point.
>>>>>
>>>>> but one of those CPUs is running the stop machine function, which calls
>>>>> CPU_DYING which runs all kinds of nonsense and therefore can send IPIs
>>>>> all it wants, right?
>>>>>
>>>>
>>>> Yes, but that CPU certainly won't IPI itself! (We are trying to avoid getting
>>>> IPIs on precisely that CPU - the one which is about to go offline).
>>>>
>>>
>>> And the comment makes that distinction between the "active-cpu" and "other CPUs"
>>> (where active-cpu is the one which runs the stop-machine function and eventually
>>> goes offline). Thus "other CPUs" won't send IPIs after that point, because they
>>> are running the stop-machine loop with interrupts disabled. This ensures that
>>> the "active-cpu" doesn't get any IPIs - which is what we want.
>>
>> OK, so clearly I'm having trouble reading today :/ Makes sense now.
>>
>> But yes, its unlikely for CPU_DYING to self-IPI, although if you really
>> want, I can do ;-)
>>
>
> Haha :-)
>
>> And I guess the one extra state doesn't hurt too bad for
>> stop_two_cpus().
>>
>
> Ok, that's good then.
>
Actually, this entire patch 2 becomes unnecessary by going with what Frederic
suggested. If we change the warning condition in patch 3 to "if cpu is offline
and there were pending callbacks, only then complain" instead of "if cpu is
offline and we got an IPI, then complain", then it doesn't matter if an IPI
arrives late (either due to hardware latency or due to stop-machine related
race), as long as we flush the callbacks before going offline.
I'll explain this in more detail in the next version of the patchset.
Regards,
Srivatsa S. Bhat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/