Re: [PATCH v2 0/7] CPU hotplug, cpusets: Fix issues with cpusetshandling upon CPU hotplug

From: Alan Stern
Date: Sat May 05 2012 - 11:24:57 EST


On Fri, 4 May 2012, Peter Zijlstra wrote:

> That said, the whole suspend/resume 'problem' does seem worth fixing and
> is a very special case where we absolutely know we're going to get back
> in the state we are in and userspace isn't actually running. So ideally
> we'd go with the bhat's patch that skips the sched_domain rebuilds
> entirely +- some bug-fixes ;-).

Just as an interesting side comment...

The USB subsystem faced this same problem years ago. The question was:
When a USB device (especially a mass-storage device) is unplugged and
then reconnected, is the new device instance the same as the old one?
Linus stepped in and firmly assured us that it was not. That's very
much like the situation you're describing: If CPU 4 is hot-unplugged
and then a new CPU appears in slot 4, is it the same CPU as before (and
does it therefore belong to the same cpusets as before)?

But this led to problems during suspend, because not all systems could
maintain bus connectivity while the system was asleep, and almost none
can during hibernation. As a result, mounted filesystems would become
unavailable after resume even though the USB storage device had been
plugged in the whole time. To the kernel, it appeared that the device
had been unplugged during suspend and then replugged during resume.

We ended up adopting a special-purpose solution just to handle that
case. It's described in Documentation/usb/persist.txt if you want the
full details. In brief, when the system resumes it checks to see if a
device appears to be present at the same port where a device used to
be. If it is, and if its descriptors match the values remembered for
the former device, then we accept the new device as being the same as
the old one, even though the hardware indicates that the connection was
not maintained during the system sleep.

>From my point of view, this suggests that CPU hot-unplug is not quite
the right tool to use during suspend. The CPU doesn't actually go
away; it merely becomes unusable for a while. In other words, this
approach applies an incorrect abstraction. What's really needed is
something a little different: a way to avoid running any tasks on that
CPU while not removing it from the system. If this means some tasks
can no longer run on any CPUs, so be it -- this happens only during
suspend, after all. Then during resume, when the CPU is brought back
up, tasks are allowed to run on it again.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/