Re: [PATCH v2 0/7] CPU hotplug, cpusets: Fix issues with cpusetshandling upon CPU hotplug

From: Daniel P. Berrange
Date: Tue May 08 2012 - 09:09:06 EST

On Fri, May 04, 2012 at 02:30:11PM -0700, Nishanth Aravamudan wrote:
> On 04.05.2012 [22:56:21 +0200], Peter Zijlstra wrote:
> > On Fri, 2012-05-04 at 13:46 -0700, Nishanth Aravamudan wrote:
> > > What about other users of cpusets (what are they?)?
> >
> > cpusets came from SGI, its traditionally used to partition _large_
> > machines. Things like the batch/job-schedulers that go with that type of
> > setup use it.
> Yeah, I recall that usage (or some description similar). Do we have any
> other known users of cpusets (beyond libvirt)?

IIRC, the project also uses cpusets (no connection to the libvirt
LXC driver mentioned below which is an alternative impl of the same concept).

> > I've no clue why libvirt uses it (or why one would use libvirt for that
> > matter).
> Well, it is the case that libvirt does use it, and libvirt is used
> pretty widely (or so it seems to me). I don't use it (cpusets or libvirt
> :) either, but it seems like we should either tell libvirt directly that
> cpusets are inappropriate for their use-case (once we figure out what
> exactly that is, and why they chose cpusets) or work with them to
> support their use-case?

Libvirt uses the cpuset cgroups functionality in two of its
virtualization drivers:

- LXC. Container based virt. The cpuset controller is used to
constrain all processes running inside the container to a
specific collection of CPUs. While we could use the traditional
sched_setaffinity() syscall at initial startup of the container,
this is not so practical when we want to dynamically change the
affinity of an existing container. It would require that we
iterate over all tasks changing their affinity, and to avoid
fork() race conditions we'd need to suspend the container while
doing this. Thus we've long used the cpuset cgroups controller
for LXC.

- KVM. Full machine virt. By default we use sched_setaffinity
to apply constraints on what host CPUs a VM executes on. Fairly
recently we added the ability to optionally use the cpuset
controller instead (only if the sysadmin has already mounted
it). The advantage of this, is that if we update the cpuset
of an existing VM, then IIUC, the kernel will migrate its
allocated memory to be local to the new CPU set mask.

The pain point we're hitting, is that upon suspend/restore the cgroups
cpuset masks are not preserved. This is not a problem for server virt
usage scenarios, but it is for desktop users with virt on laptaops.

I don't see a viable alternative to the cpuset controller for our LXC
container driver. For KVM we could do without the cpuset controller
if there is alternative way to tell the kernel to migrate the KVM
process memory to be local to the new CPU affinity set using the
sched_setaffinity() call.

We are open to suggestions of alternative approaches, particularly since
we have had no end of trouble with pretty much all of the kernel's
cgroups controllers :-(

|: -o- :|
|: -o- :|
|: -o- :|
|: -o- :|
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at