Re: [PATCH v2 2/3] Mutually exclude cpu online and suspend/hibernate

From: Srivatsa S. Bhat
Date: Wed Oct 12 2011 - 17:25:43 EST


On 10/13/2011 01:01 AM, Rafael J. Wysocki wrote:
> On Wednesday, October 12, 2011, Srivatsa S. Bhat wrote:
>> On 10/12/2011 03:26 AM, Rafael J. Wysocki wrote:
>>> On Tuesday, October 11, 2011, Srivatsa S. Bhat wrote:
>>>> On 10/10/2011 08:46 PM, Srivatsa S. Bhat wrote:
>>>>> On 10/10/2011 07:56 PM, Peter Zijlstra wrote:
>>>>>> On Mon, 2011-10-10 at 18:15 +0530, Srivatsa S. Bhat wrote:
>>>>>>>> + /*
>>>>>>>> + * Prevent cpu online and suspend/hibernate (including freezer)
>>>>>>>> + * operations from running in parallel. Fail cpu online if suspend or
>>>>>>>> + * hibernate has already started.
>>>>>>>> + */
>>>>>>>> + if (!trylock_pm_sleep())
>>>>>>>
>>>>>>> Would it be better to hook into the suspend/hibernate notifiers and
>>>>>>> use them to exclude cpu hotplug from suspend/hibernate, instead of
>>>>>>> trying to take pm_mutex lock like this?
>>>>>>> Peter, I remember you pointing out in another patch's review
>>>>>>> (http://thread.gmane.org/gmane.linux.kernel/1198312/focus=1199087)
>>>>>>> that introducing more locks in cpu hotplug would be a bad idea. Does that
>>>>>>> comment hold here as well, or is this fine?
>>>>>>
>>>>>> Arguably pm_mutex is already involved in the whole hotplug dance due to
>>>>>> suspend using it, that said, I'm not at all familiar with the whole
>>>>>> suspend/hibernate side of things.
>>>>>>
>>>>>> I tried having a quick look this morning but failed to find the actual
>>>>>> code.
>>>>>>
>>>>>> I think it would be good to have an overview of the various locks and a
>>>>>> small description of how they interact/nest.
>>>>>>
>>>>>
>>>>> Sure. I'll put together whatever I have understood, in the form of a patch
>>>>> to Documentation/power directory and post it tomorrow, for the benefit of
>>>>> all.
>>>>>
>>>>
>>>> Here it is, just as promised :-)
>>>> http://lkml.org/lkml/2011/10/11/393
>>>
>>> Well, I have an idea.
>>>
>>> Why don't we make drivers/base/cpu.c:store_online() take pm_mutex
>>> in addition to calling cpu_hotplug_driver_lock()? This at least
>>> will make the interface mutually exclusive with suspend/hibernation.
>>>
>>
>> Oh, no no.. We shouldn't be doing that even though it seems very
>> innocuous, because of a subtle reason: the memory hotplug code called
>> in cpu_up() tries to acquire pm_mutex! So we will end up deadlocking
>> ourselves, due to recursive locking!
>
> So, you're referring to mem_online_node(). Is it actually used
> from any place other than cpu_up()? Perhaps we can remove the
> lock_system_sleep() from lock_memory_hotplug() if store_online()
> acquires pm_mutex? And if we can't, then why exactly?
>

I didn't find any other place where mem_online_node() is called. But I'll
check it more thoroughly to be sure. If cpu_up() is the only place where
it is called, then I think we can do what you said above. But doesn't it
seem to be more intrusive than what my patch does? I mean, what if something
breaks because of that avoidable modification to memory hotplug code?

>> See kernel/cpu.c: cpu_up() calls mem_online_node() [defined in
>> mm/memory_hotplug.c/mem_online_node() which calls lock_memory_hotplug()
>> which internally calls lock_system_sleep(), which is where it tries
>> to get pm_mutex].
>>
>> So this patchset implements the mutual exclusion in the cpu_up() function
>> (i.e., a little bit deeper down the road than store_online() ) and solves
>> the problem.
>
> Which is not exactly the right place to do that as Peter has already
> indicated. What you want is really the CPU hotplug interface to be
> mutually exclusive with the suspend/hibernation interface, not the
> CPU hotplug itself to be mutually exclusive with suspend/hibernation.
>

Oh, is that what Peter meant? I didn't derive that meaning from what
he said...
Making the interfaces themselves mutually exclusive is also a good idea.
But please see my concerns above, about its implementation aspect.

Since my patch solves the problem with less intrusion, and produces
roughly the same effect as this interface exclusion idea, why not go with
it?

Or am I missing something here?

--
Regards,
Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>
Linux Technology Center,
IBM India Systems and Technology Lab

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/