I don't know if their are instances when interrupts are actually
disabled for such a long time in the kernel , but I don't see a reason
why this might not be happening currently, i.e. do we have a way to
detect such cases. I noticed this problem ( with process accounting) only when testing my
stolen time theory below, in which i had intentionally disabled
interrupts for long.
So, in case of buggy code which disables interrupt for long, this could
affect process accounting and could result in the stolen time being
reported incorrectly ( considering the stolen time idea mentioned below
is okay).
I stumbled across this while trying to find a solution to figure out theYou're assuming that the tsc is always going to be advancing at a constant rate in wallclock time? Is that a good assumption? Does VMWare virtualize the tsc to make this valid? If something's going to the effort of virtualizing tsc, how do you know they're not also excluding stolen time?
amount of stolen time from Linux, when it is running under a hypervisor.
One of the solutions could be to ask the hypervisor directly for this
info, but in my quest to find a generic solution I think the below would
work too.
The total process time accounted by the system on a cpu ( system, idle,
wait and etc) when deducted from the amount TSC counter has advanced
since boot, should give us this info about the cputime stolen from the
kernel
Yes, TSC is the correct thing atleast for VMware over here. But my idea
is not to advocate using TSC here, if it doesn't work for Xen we could
use something else which gives a notion of Total_time there, a parvirt
call to read that can be done. I don't know what that would be for XEN,
but you would know better, please suggest if there is already a paravirt
call which gets that value for XEN ?
What timebase is the kernel using to measure idle, system, wait, ...? Presumably something that doesn't include stolen time. In that case this just comes down to "PCPU_STOLEN = TOTAL_TIME - PCPU_UNSTOLEN_TIME", where you're proposing that TOTAL_TIME is the tsc.
Again not proposing to use tsc, please suggest what works for Xen. And about the PCU_UNSTOLEN_TIME, i am proposing it could be a summation
of all the fields in kstat_cpu.cpustat except the steal value.
Direct use of the tsc definitely doesn't work in a Xen PV guest because the tsc is the raw physical cpu tsc; but Xen also provides everything you need to derive a globally-meaningful timebase from the tsc. Xen also provides per-vcpu info on time spent blocked, runnable (ie, could run but no pcpu available), running and offline.That means it should be easy to get the TOTAL_Time value then ?