Re: [accounting regression since rc1] scheduler updates
From: Ingo Molnar
Date: Tue Aug 21 2007 - 05:36:38 EST
* Martin Schwidefsky <schwidefsky@xxxxxxxxxx> wrote:
> > hm, does on s390 scheduler_tick() get driven in virtual time or in
> > real time? The very latest scheduler code will enforce a minimum
> > rate of sched_clock() across two scheduler_tick() calls (in rc3 and
> > later kernels). If sched_clock() "slows down" but scheduler_tick()
> > still has a real-time frequency then that impacts the quality of
> > scheduling. So scheduler_tick() and sched_clock() must really have
> > the same behavior (either both are virtual or both are real), so
> > that scheduling becomes invariant to steal-time.
>
> scheduler_tick() is based on the HZ timer which uses the TOD clock =
> real time. sched_clock() currently uses the TOD clock as well so in
> regard to the new scheduler we currently do not have a problem. We
> have a problem with cpu time accounting, the change to the /proc code
> breaks the precise accounting on s390. To solve the cpu time
> accounting we need to change sched_clock() to the cpu timer = virtual
> time. To change the scheduler_tick() as well requires another patch
> and I fear it would complicate things in the s390 backend.
my feeling is that it gives us generally higher-quality scheduling if we
drive all things scheduler via virtual time. Do you agree with that?
> And if you say that the scheduling becomes invariant to steal-time,
> how is the cpu time accounting via sum_exec supposed to work if it
> does not take steal-time into account ?
right now there are two distinct and independent things: scheduler
behavior (the scheduling decisions the scheduler makes) and accounting
behavior.
the 'invariant' i mentioned only covers scheduler behavior, not
accounting behavior. Accounting is separate in theory, but coupled in
practice now via sum_exec_runtime.
Before we do a patch to decouple them again, lets make sure we agree on
the direction to take here. There are two ways to account within a
virtual machine: either in real time or in virtual time.
it seems you'd like accounting to be sensitive to 'external load' - i.e.
you'd like an 'internal' top to show the 'real' CPU accounting, right?
Wouldnt it be more consistent if a virtual box would not show any
dependency on external load? (i.e. it would slow down all of its
internal functionality transparently, without exposing it via /proc. The
only way to observe that would be the TOD interfaces: gettimeofday and
real-time clock driven POSIX timers. Even timer_list could be driven via
virtual time - although that would probably break user expectations,
right?) Or would accounting-in-virtual-time break user expectations too?
(most of the other hypervisors let guests account in virtual time.)
Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/