On Tue, 2007-03-06 at 16:42 -0800, Dan Hecht wrote:Okay, to confirm I'm on the same page as you, you want to move process time accounting from being periodic sampled based to being trace based? i.e. at the system-call/interrupt boundaries, read clocksource and compute directly the amount of system/user/process time?accounting would be wrong. Instead, we should allow the tick_sched_timer in cases (c) and (d) to have runtime configurable period, and then scale the time value accordingly before passing to account_system_time. This is probably something the Xen folks will want also, since I think Xen itself only gets 100hz hard timer, and so it can implement at best a oneshot virtual timer with 100hz resolution. Any objections to us doing something like this?Yes. It's gross hackery.
1) We want to have a cleanup of the tick assumptions _all_ over the
place and this is going to be real hard work.
2) As I said above. The time accounting for virtualization needs to be
fixed in a generic way.
I'm not going to accept some weird hackery for virtualization, which is
of exactly ZERO value for the kernel itself. Quite the contrary it will
make the cleanup harder and introduce another hard to remove thing,
which will in the worst case last for ever.
At least for the paravirt guests this is the correct approach. Once the
CPU vendors come up with a sane solution for a reliable and fast clock
source we might use that on real hardware as well.
Do you know if anyone has explored this? I thought there was a discussion about this a while back but it was rejected due to the sample-based approach having much lower overheads on high system call rate workloads.
Yes, with todays hardware it is simply a PITA. PowerPC has some basic
support for this though, IIRC.