Re: [PATCH v2 2/3] Mutually exclude cpu online and suspend/hibernate
From: Srivatsa S. Bhat
Date: Tue Oct 11 2011 - 23:57:42 EST
On 10/12/2011 03:26 AM, Rafael J. Wysocki wrote:
> On Tuesday, October 11, 2011, Srivatsa S. Bhat wrote:
>> On 10/10/2011 08:46 PM, Srivatsa S. Bhat wrote:
>>> On 10/10/2011 07:56 PM, Peter Zijlstra wrote:
>>>> On Mon, 2011-10-10 at 18:15 +0530, Srivatsa S. Bhat wrote:
>>>>>> + /*
>>>>>> + * Prevent cpu online and suspend/hibernate (including freezer)
>>>>>> + * operations from running in parallel. Fail cpu online if suspend or
>>>>>> + * hibernate has already started.
>>>>>> + */
>>>>>> + if (!trylock_pm_sleep())
>>>>>
>>>>> Would it be better to hook into the suspend/hibernate notifiers and
>>>>> use them to exclude cpu hotplug from suspend/hibernate, instead of
>>>>> trying to take pm_mutex lock like this?
>>>>> Peter, I remember you pointing out in another patch's review
>>>>> (http://thread.gmane.org/gmane.linux.kernel/1198312/focus=1199087)
>>>>> that introducing more locks in cpu hotplug would be a bad idea. Does that
>>>>> comment hold here as well, or is this fine?
>>>>
>>>> Arguably pm_mutex is already involved in the whole hotplug dance due to
>>>> suspend using it, that said, I'm not at all familiar with the whole
>>>> suspend/hibernate side of things.
>>>>
>>>> I tried having a quick look this morning but failed to find the actual
>>>> code.
>>>>
>>>> I think it would be good to have an overview of the various locks and a
>>>> small description of how they interact/nest.
>>>>
>>>
>>> Sure. I'll put together whatever I have understood, in the form of a patch
>>> to Documentation/power directory and post it tomorrow, for the benefit of
>>> all.
>>>
>>
>> Here it is, just as promised :-)
>> http://lkml.org/lkml/2011/10/11/393
>
> Well, I have an idea.
>
> Why don't we make drivers/base/cpu.c:store_online() take pm_mutex
> in addition to calling cpu_hotplug_driver_lock()? This at least
> will make the interface mutually exclusive with suspend/hibernation.
>
Oh, no no.. We shouldn't be doing that even though it seems very
innocuous, because of a subtle reason: the memory hotplug code called
in cpu_up() tries to acquire pm_mutex! So we will end up deadlocking
ourselves, due to recursive locking!
See kernel/cpu.c: cpu_up() calls mem_online_node() [defined in
mm/memory_hotplug.c/mem_online_node() which calls lock_memory_hotplug()
which internally calls lock_system_sleep(), which is where it tries
to get pm_mutex].
So this patchset implements the mutual exclusion in the cpu_up() function
(i.e., a little bit deeper down the road than store_online() ) and solves
the problem.
I thought I had put a comment there in my code warning about this locking
order thing, but I must have removed it in the last minute, just before
posting it....
--
Regards,
Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>
Linux Technology Center,
IBM India Systems and Technology Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/