Re: [PATCH] Documentation/power: Update docs about suspend and CPUhotplug

From: Srivatsa S. Bhat
Date: Wed Oct 12 2011 - 17:13:54 EST


On 10/13/2011 12:49 AM, Rafael J. Wysocki wrote:
> On Wednesday, October 12, 2011, Srivatsa S. Bhat wrote:
>> On 10/12/2011 03:32 AM, Rafael J. Wysocki wrote:
>>> On Tuesday, October 11, 2011, Srivatsa S. Bhat wrote:
>>>> Update the documentation about the interaction between the suspend (S3) call
>>>> path and the CPU hotplug infrastructure.
>>>> This patch focusses only on the activities of the freezer, cpu hotplug and
>>>> the notifications involved. It outlines how regular CPU hotplug differs from
>>>> the way it is invoked during suspend and also tries to explain the locking
>>>> involved.
>>>>
>>>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>
>>>> ---
>>>>
>>>> Documentation/power/00-INDEX | 2
>>>> Documentation/power/suspend-and-cpuhotplug.txt | 113 ++++++++++++++++++++++++
>>>> 2 files changed, 115 insertions(+), 0 deletions(-)
>>>> create mode 100644 Documentation/power/suspend-and-cpuhotplug.txt
>>>>
>>>> diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
>>>> index 45e9d4a..a4d682f 100644
>>>> --- a/Documentation/power/00-INDEX
>>>> +++ b/Documentation/power/00-INDEX
>>>> @@ -26,6 +26,8 @@ s2ram.txt
>>>> - How to get suspend to ram working (and debug it when it isn't)
>>>> states.txt
>>>> - System power management states
>>>> +suspend-and-cpuhotplug.txt
>>>> + - Explains the interaction between Suspend-to-RAM (S3) and CPU hotplug
>>>> swsusp-and-swap-files.txt
>>>> - Using swap files with software suspend (to disk)
>>>> swsusp-dmcrypt.txt
>>>> diff --git a/Documentation/power/suspend-and-cpuhotplug.txt b/Documentation/power/suspend-and-cpuhotplug.txt
>>>> new file mode 100644
>>>> index 0000000..d0ba411
>>>> --- /dev/null
>>>> +++ b/Documentation/power/suspend-and-cpuhotplug.txt
>>>> @@ -0,0 +1,113 @@
>>>> +Interaction of Suspend code (S3) with the CPU hotplug infrastructure
>>>> + (C) 2011 Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>, GPL
>>>> +
>>>> +
>>>> +I. How does the Suspend-to-RAM code interact with CPU hotplug infrastructure?
>>>> +
>>>> +Well, a picture speaks more than a thousand words... So ASCII art follows :-)
>>>> +
>>>> +[This depicts the current design in the kernel, and focusses only on the
>>>> +interactions between suspend call paths involving the freezer and cpu hotplug
>>>> +and also tries to explain the locking involved. It also outlines the
>>>> +notifications involved.]
>>>> +
>>>> +On a high level, the suspend-resume cycle goes like this:
>>>> +
>>>> +|Freeze| -> |Disable nonboot| -> |Do suspend| -> |Enable nonboot| -> |Thaw |
>>>> +|tasks | | cpus | | | | cpus | |tasks|
>>>> +
>>>> +
>>>> +More details follow:
>>>> +
>>>> +Regular CPU hotplug Suspend call path
>>>> +------------------- ---------------------------
>>>> +
>>>> +Write 0 (or 1) to Write 'mem' to
>>>> +/sys/devices/system/cpu/cpu*/online /sys/power/state
>>>> + sysfs file syfs file
>>>> + | |
>>>> + | v
>>>> + | Acquire pm_mutex lock
>>>> + | |
>>>> + | v
>>>> + | Send PM_SUSPEND_PREPARE notifications
>>>> + | |
>>>> + | v
>>>> + | Freeze tasks
>>>
>>> OK, so something appears to be missing here. Namely, the task writing to
>>> /sys/devices/system/cpu/cpu*/online should be frozen at this point or
>>> suspend should be aborted. I suppose neither of these happens and I wonder
>>> why exactly.
>>>
>>
>> I have a couple of clarifications to make here:
>> * Firstly, this picture is not meant to represent what happens when regular
>> cpu hotplug and suspend run together. That race condition has not been
>> brought out here. What it does try to explain is, how the regular cpu
>> hotplug path is different from suspend, and where they share common code.
>> Please don't think about timing/race condition when reading it. Its just
>> meant to explain the call path and locking involved.
>
> Well, I didn't understand this part. And the question above is:
>
>> I. How does the Suspend-to-RAM code interact with CPU hotplug infrastructure?
>
> which kind of suggests something different from what you're saying. Care to
> clarify that in the document?
>

Ok, I get it. I'll clarify the question and post the next version.

>> * Secondly, this picture explains the *current* design, and *not* the mutual
>> exclusion design I have proposed between regular cpu hotplug and suspend.
>> The reason being, this doc was written to help everyone understand the
>> current locking schemes, to help evaluate my proposal for a different
>> scheme (mutual exclusion).
>
> I understand that.
>
>> Now, coming to your point, if that task writing to the sysfs file has not
>> been frozen, then the current kernel doesn't abort suspend, which is why we are
>> encountering problems, and which is exactly what my patchset tries to solve.
>> Link to my patchset:
>> http://thread.gmane.org/gmane.linux.documentation/3414/focus=3414
>
> This isn't my point, actually. My point is that the task writing to
> /sys/devices/system/cpu/cpu*/online should be frozen by the freezer.
> If it is not frozen, then the freezer should fail. If that doesn't
> happen, there's a bug that has to be fixed and it is _not_ the lack
> of mutual exclusion. The bug is that, apparently, suspend continues
> even though there is an unfrozen user space process in the system.
>
> Do you have any idea why that happens?
>

Sorry, I think I explained it wrong above. The freezer doesn't have any bugs
in this context. If it fails to freeze the tasks, suspend does get aborted.

But the point here is, suppose the task writing to the 'online' sysfs
file has already entered the kernel, and _now_ the freezer started
freezing tasks, it might encounter trouble in freezing that cpu hotplug
operation that has already begun (because the cpu hotplug online operation
waits on the frozen userspace to get microcode).

So, to clarify again, regular cpu hotplug and the cpu hotplug operations
carried out during suspend are properly serialized in the current kernel.
We have no problems there.

The problem is with freezer and regular cpu hotplug racing with each other,
as illustrated in this scenario:
* the regular cpu online operation continues its journey from userspace
into the kernel, since the freezing has not yet begun.
* then freezer starts and freezes userspace.
* If cpu online has not yet completed the microcode update stuff by now,
it will now start waiting on the frozen userspace unfortunately.
* Now the freezer continues and tries to freeze the remaining tasks. But
due to this wait mentioned above, the freezer won't be able to freeze
the cpu online hotplug task and hence freezing of tasks fails.
[This race condition is where the whole problem lies.]

And if freezing of tasks fails, then suspend gets aborted. So no problems
there again.

Thus, since the race condition between freezer and regular cpu hotplug is
where the problem lies, my patch prevents the two from running simultaneously
by implementing mutual exclusion. But it so happens that since freezer is
almost the first step in the suspend/hibernate path, and since pm_mutex
is acquired in this path at the very beginning and released only after all
the tasks are thawed, I reused this lock to implement the mutual exclusion,
which now translates (from our requirement of mutual exclusion between
just freezer and regular cpu hotplug) to a mutual exclusion between regular
cpu hotplug and the entire suspend operation.

--
Regards,
Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>
Linux Technology Center,
IBM India Systems and Technology Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/