Re: Stolen and degraded time and schedulers

From: Con Kolivas
Date: Wed Mar 14 2007 - 17:32:37 EST


On Thursday 15 March 2007 08:36, Con Kolivas wrote:
> On Wednesday 14 March 2007 03:31, Jeremy Fitzhardinge wrote:
> > The current Linux scheduler makes one big assumption: that 1ms of CPU
> > time is the same as any other 1ms of CPU time, and that therefore a
> > process makes the same amount of progress regardless of which particular
> > ms of time it gets.
> >
> > This assumption is wrong now, and will become more wrong as
> > virtualization gets more widely used.
> >
> > It's wrong now, because it fails to take into account of several kinds
> > of missing time:
> >
> > 1. interrupts - time spent in an ISR is accounted to the current
> > process, even though it gets no direct benefit
> > 2. SMM - time is completely lost from the kernel
> > 3. slow CPUs - 1ms of 600MHz CPU is less useful than 1ms of 2.4GHz CPU
> >
> > The first two - time lost to interrupts - are a well known problem, and
> > are generally considered to be a non issue. If you're losing a
> > significant amount of time to interrupts, you probably have bigger
> > problems. (Or maybe not?)
> >
> > The third is not something I've seen discussed before, but it seems like
> > it could be a significant problem today. Certainly, I've noticed it
> > myself: an interactive program decides to do something CPU-intensive
> > (like start an animation), and it chugs until the conservative governor
> > brings the CPU up to speed. Certainly some of this is because its just
> > plain CPU-starved, but I think another factor is that it gets penalized
> > for running on a slow CPU: 1ms is not 1ms. And for power reasons you
> > want to encourage processes to run on slow CPUs rather than penalize
> > them.
> >
> > Virtualization just exacerbates this. If you have a busy machine
> > running multiple virtual CPUs, then each VCPU may only get a small
> > proportion of the total amount of available CPU time. If the kernel's
> > scheduler asserts that "you were just scheduled for 1ms, therefore you
> > made 1ms of progress", then many timeslices will effectively end up
> > being 1ms of 0Mhz CPU - because the VCPU wasn't scheduled and the real
> > CPU was doing something else.
> >
> >
> > So how to deal with this? Basically we need a clock which measures "CPU
> > work units", and have the scheduler use this clock.
> >
> > A "CPU work unit" clock has these properties:
> >
> > * inherently per-CPU (from the kernel's perspective, so it would be
> > per-VCPU in a virtual machine)
> > * monotonic - you can't do negative work
> > * measured in "work units"
> >
> > A "work unit" is probably most simply expressed in cycles - you assume a
> > cycle of CPU time is equivalent in terms of work done to any other
> > cycle. This means that 1 cycle at 600MHz is equivalent to 1 cycle at
> > 2.4GHz - but of course the 2.4GHz processor gets 4 times as many in any
> > real time interval. (This is the instance where the worst kind of tsc -
> > varying speed which stops on idle - is actually exactly what you want.)
> >
> > You could also measure "work units" in terms of normalized time units:
> > if the fastest CPU on the machine is 2.4GHz, then 1ms is 1ms a work unit
> > on that CPU, but 250us on the 600MHz CPU.
> >
> > It doesn't really matter what the unit is, so long as it is used
> > consistently to measure how much progress all processes made.
>
> I think you're looking for a complex solution to a problem that doesn't
> exist. The job of the process scheduler is to meter out the available cpu
> resources. It cannot make up cycles for a slow cpu or one that is
> throttled. If the problem is happening due to throttling it should be fixed
> by altering the throttle. The example you describe with the conservative
> governor is as easy to fix as changing to the ondemand governor.
> Differential power cpus on an SMP machine should be managed by SMP
> balancing choices based on power groups.
>
> It would be fine to implement some other accounting of this definition of
> time for other purposes

I mean such as for virtualisation purposes.

> but not for process scheduler decisions per se.

>
> Sorry to chime in late. My physical condition prevents me spending any
> extended period of time at the computer so I've tried to be succinct with
> my comments and may not be able to reply again.

--
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/