I am deeply sorry.
I was busy first time I read this, so I postponed answering and ended up
SorryOk. So I was wrong in my hunch that it would be outside the runqueue,include/linux/sched.h:So I looked at something like this in the past. To make sure things
unsigned long long run_delay; /* time spent waiting on a runqueue */
So if you are out of the runqueue, you won't get steal time accounted,
and then I truly fail to understand what you are doing.
I set up a cgroup on my test server running a kernel built from the
latest tip tree.
[root]# cat cpu.cfs_quota_us
[root]# cat cpu.cfs_period_us
[root]# cat cpuset.cpus
[root]# cat cpuset.mems
Next I put the PID from the cpu thread into tasks. When I start a
script that will hog the cpu I see the
following in top on the guest
Cpu(s): 1.9%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 48.3%hi, 0.0%si,
So the steal time here is in line with the bandwidth control settings.
therefore work automatically. Still, the host kernel has all the
information in cgroups.
So then the steal time did not show on the guest. You have no valueThis is true for almost everything we have in the kernel!
that needs to be passed
around. What I did not like about this approach was
* only works for cfs bandwidth control. If another type of hard limit
was added to the kernel
the code would potentially need to change.
It is *very* unlikely for other bandwidth control mechanism to ever
appear. If it ever does, it's *their* burden to make sure it works for
steal time (provided it is merged). Code in tree gets precedence.
* This approach doesn't help if the limits are set by overcommitting theI can't say anything about commonality, but common or not, it is a
cpus. It is my understanding
that this is a common approach.
When you simply overcommit, you have no way to differentiate between
intended steal time and non-intended steal time. Moreover, when you
overcommit, your cpu usage will vary over time. If two guests use the
cpu to their full power, you will have 50 % each. But if one of them
slows down, the other gets more. What is your entitlement value? How do
you define this?
And then after you define it, you end up using more than this, what is
your cpu usage? 130 %?
The only sane way to do it, is to communicate this value to the kernel
somehow. The bandwidth controller is the interface we have for that. So
everybody that wants to *intentionally* overcommit needs to communicate
this to the controller. IOW: Any sane configuration should be explicit
about your capping.
No, that is just crazy, and I don't like it a single bit.I'm not understanding that comment. If you are capping by simplyThis definitely should go away.Add an ioctl to communicate the consign limit to the host.
More specifically, *whatever* way we use to cap the processor, the host
system will have all the information at all times.
controlling the amount of
overcommit on the host then wouldn't you still need some value to
indicate the desired amount.
So in the light of it: Whatever capping mechanism we have, we need to be
explicit about the expected entitlement. At this point, the kernel
already knows what it is, and needs no extra ioctls or anything like that.
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html