Re: Stolen and degraded time and schedulers

From: john stultz
Date: Tue Mar 13 2007 - 16:13:20 EST


On Tue, 2007-03-13 at 09:31 -0700, Jeremy Fitzhardinge wrote:
> The current Linux scheduler makes one big assumption: that 1ms of CPU
> time is the same as any other 1ms of CPU time, and that therefore a
> process makes the same amount of progress regardless of which particular
> ms of time it gets.
>
> This assumption is wrong now, and will become more wrong as
> virtualization gets more widely used.
>
> It's wrong now, because it fails to take into account of several kinds
> of missing time:
>
> 1. interrupts - time spent in an ISR is accounted to the current
> process, even though it gets no direct benefit
> 2. SMM - time is completely lost from the kernel
> 3. slow CPUs - 1ms of 600MHz CPU is less useful than 1ms of 2.4GHz CPU
>
[snip]
> So how to deal with this? Basically we need a clock which measures "CPU
> work units", and have the scheduler use this clock.
>
> A "CPU work unit" clock has these properties:
>
> * inherently per-CPU (from the kernel's perspective, so it would be
> per-VCPU in a virtual machine)
> * monotonic - you can't do negative work
> * measured in "work units"
[snip]
> So, how to implement this?
>
> One quick hack would be to just make a new clocksource entrypoint, which
> returns work units rather than real-time cycles. That would be fairly
> simple to implement, but it doesn't really take the per-cpu nature of
> the clock into account (since its possible that different cpus on the
> same machine might need their own methods).
>
> Perhaps a better fit would be an entity which is equivalent to a
> clocksource, but registered per-cpu like (some) clockevents.
>
> I don't have a particular preference, but I wonder what the clock gurus
> think.

My gut reaction would be to avoid using clocksources for now. While
there is some thought going into how to expand clocksources for other
uses (Daniel is working on this, for example), the design for
clocksources has been very focused on its utility to timekeeping, so I'm
hesitant to try complicate the clocksources in order to multiplex
functionality until what is really needed is well understood.

I suspect the best approach would be see how the sched_clock interface
can be reworked/used for what you want, as it's design goals map closest
to the work-unit properties you list above.

Then we can look to see how clocksources can be best used to implement
the sched_clock interface.

-john









-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/