Re: [PATCH] Documentation/power: Update docs about suspend and CPU hotplug

From: Rafael J. Wysocki
Date: Sat Oct 15 2011 - 18:40:22 EST


On Wednesday, October 12, 2011, Srivatsa S. Bhat wrote:
> On 10/13/2011 12:49 AM, Rafael J. Wysocki wrote:
> > On Wednesday, October 12, 2011, Srivatsa S. Bhat wrote:
> >> On 10/12/2011 03:32 AM, Rafael J. Wysocki wrote:
> >>> On Tuesday, October 11, 2011, Srivatsa S. Bhat wrote:
> >>>> Update the documentation about the interaction between the suspend (S3) call
> >>>> path and the CPU hotplug infrastructure.
> >>>> This patch focusses only on the activities of the freezer, cpu hotplug and
> >>>> the notifications involved. It outlines how regular CPU hotplug differs from
> >>>> the way it is invoked during suspend and also tries to explain the locking
> >>>> involved.
> >>>>
> >>>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>
> >>>> ---
> >>>>
> >>>> Documentation/power/00-INDEX | 2
> >>>> Documentation/power/suspend-and-cpuhotplug.txt | 113 ++++++++++++++++++++++++
> >>>> 2 files changed, 115 insertions(+), 0 deletions(-)
> >>>> create mode 100644 Documentation/power/suspend-and-cpuhotplug.txt
> >>>>
> >>>> diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
> >>>> index 45e9d4a..a4d682f 100644
> >>>> --- a/Documentation/power/00-INDEX
> >>>> +++ b/Documentation/power/00-INDEX
> >>>> @@ -26,6 +26,8 @@ s2ram.txt
> >>>> - How to get suspend to ram working (and debug it when it isn't)
> >>>> states.txt
> >>>> - System power management states
> >>>> +suspend-and-cpuhotplug.txt
> >>>> + - Explains the interaction between Suspend-to-RAM (S3) and CPU hotplug
> >>>> swsusp-and-swap-files.txt
> >>>> - Using swap files with software suspend (to disk)
> >>>> swsusp-dmcrypt.txt
> >>>> diff --git a/Documentation/power/suspend-and-cpuhotplug.txt b/Documentation/power/suspend-and-cpuhotplug.txt
> >>>> new file mode 100644
> >>>> index 0000000..d0ba411
> >>>> --- /dev/null
> >>>> +++ b/Documentation/power/suspend-and-cpuhotplug.txt
> >>>> @@ -0,0 +1,113 @@
> >>>> +Interaction of Suspend code (S3) with the CPU hotplug infrastructure
> >>>> + (C) 2011 Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>, GPL
> >>>> +
> >>>> +
> >>>> +I. How does the Suspend-to-RAM code interact with CPU hotplug infrastructure?
> >>>> +
> >>>> +Well, a picture speaks more than a thousand words... So ASCII art follows :-)
> >>>> +
> >>>> +[This depicts the current design in the kernel, and focusses only on the
> >>>> +interactions between suspend call paths involving the freezer and cpu hotplug
> >>>> +and also tries to explain the locking involved. It also outlines the
> >>>> +notifications involved.]
> >>>> +
> >>>> +On a high level, the suspend-resume cycle goes like this:
> >>>> +
> >>>> +|Freeze| -> |Disable nonboot| -> |Do suspend| -> |Enable nonboot| -> |Thaw |
> >>>> +|tasks | | cpus | | | | cpus | |tasks|
> >>>> +
> >>>> +
> >>>> +More details follow:
> >>>> +
> >>>> +Regular CPU hotplug Suspend call path
> >>>> +------------------- ---------------------------
> >>>> +
> >>>> +Write 0 (or 1) to Write 'mem' to
> >>>> +/sys/devices/system/cpu/cpu*/online /sys/power/state
> >>>> + sysfs file syfs file
> >>>> + | |
> >>>> + | v
> >>>> + | Acquire pm_mutex lock
> >>>> + | |
> >>>> + | v
> >>>> + | Send PM_SUSPEND_PREPARE notifications
> >>>> + | |
> >>>> + | v
> >>>> + | Freeze tasks
> >>>
> >>> OK, so something appears to be missing here. Namely, the task writing to
> >>> /sys/devices/system/cpu/cpu*/online should be frozen at this point or
> >>> suspend should be aborted. I suppose neither of these happens and I wonder
> >>> why exactly.
> >>>
> >>
> >> I have a couple of clarifications to make here:
> >> * Firstly, this picture is not meant to represent what happens when regular
> >> cpu hotplug and suspend run together. That race condition has not been
> >> brought out here. What it does try to explain is, how the regular cpu
> >> hotplug path is different from suspend, and where they share common code.
> >> Please don't think about timing/race condition when reading it. Its just
> >> meant to explain the call path and locking involved.
> >
> > Well, I didn't understand this part. And the question above is:
> >
> >> I. How does the Suspend-to-RAM code interact with CPU hotplug infrastructure?
> >
> > which kind of suggests something different from what you're saying. Care to
> > clarify that in the document?
> >
>
> Ok, I get it. I'll clarify the question and post the next version.
>
> >> * Secondly, this picture explains the *current* design, and *not* the mutual
> >> exclusion design I have proposed between regular cpu hotplug and suspend.
> >> The reason being, this doc was written to help everyone understand the
> >> current locking schemes, to help evaluate my proposal for a different
> >> scheme (mutual exclusion).
> >
> > I understand that.
> >
> >> Now, coming to your point, if that task writing to the sysfs file has not
> >> been frozen, then the current kernel doesn't abort suspend, which is why we are
> >> encountering problems, and which is exactly what my patchset tries to solve.
> >> Link to my patchset:
> >> http://thread.gmane.org/gmane.linux.documentation/3414/focus=3414
> >
> > This isn't my point, actually. My point is that the task writing to
> > /sys/devices/system/cpu/cpu*/online should be frozen by the freezer.
> > If it is not frozen, then the freezer should fail. If that doesn't
> > happen, there's a bug that has to be fixed and it is _not_ the lack
> > of mutual exclusion. The bug is that, apparently, suspend continues
> > even though there is an unfrozen user space process in the system.
> >
> > Do you have any idea why that happens?
> >
>
> Sorry, I think I explained it wrong above. The freezer doesn't have any bugs
> in this context. If it fails to freeze the tasks, suspend does get aborted.
>
> But the point here is, suppose the task writing to the 'online' sysfs
> file has already entered the kernel, and _now_ the freezer started
> freezing tasks, it might encounter trouble in freezing that cpu hotplug
> operation that has already begun (because the cpu hotplug online operation
> waits on the frozen userspace to get microcode).
>
> So, to clarify again, regular cpu hotplug and the cpu hotplug operations
> carried out during suspend are properly serialized in the current kernel.
> We have no problems there.
>
> The problem is with freezer and regular cpu hotplug racing with each other,
> as illustrated in this scenario:
> * the regular cpu online operation continues its journey from userspace
> into the kernel, since the freezing has not yet begun.
> * then freezer starts and freezes userspace.
> * If cpu online has not yet completed the microcode update stuff by now,
> it will now start waiting on the frozen userspace unfortunately.
> * Now the freezer continues and tries to freeze the remaining tasks. But
> due to this wait mentioned above, the freezer won't be able to freeze
> the cpu online hotplug task and hence freezing of tasks fails.
> [This race condition is where the whole problem lies.]
>
> And if freezing of tasks fails, then suspend gets aborted. So no problems
> there again.

That pretty much is how it is supposed to be.

Do I understand correctly that you're attempting to make suspend always
succeed if CPU hotplug stress test is run in parallel with it?

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/