Re: fair clock use in CFS

From: William Lee Irwin III
Date: Mon May 14 2007 - 06:29:12 EST


On Mon, May 14, 2007 at 02:03:58PM +0530, Srivatsa Vaddagiri wrote:
> I have been brooding over how fair clock is computed/used in
> CFS and thought I would ask the experts to avoid wrong guesses!
> As I understand, fair_clock is a monotonously increasing clock which
> advances at a pace inversely proportional to the load on the runqueue.
> If load = 1 (task), it will advance at same pace as wall clock, as
> load increases it advances slower than wall clock.
> In addition, following calculations depend on fair clock: task's wait
> time on runqueue and sleep time outside the runqueue (both reflected in
> p->wait_run_time).

It's not hard to see that that's a mistake. The great thing about virtual
deadline schedulers, though, is that all it takes to completely rewrite
the policy is changing the numbers calculated for these things.


On Mon, May 14, 2007 at 02:03:58PM +0530, Srivatsa Vaddagiri wrote:
> Few questions that come up are:
> 1. Why can't fair clock be same as wall clock at all times? i.e fair
> clock progresses at same pace as wall clock independent of the load on
> the runqueue.
> It would still give the ability to measure time spent waiting on runqueue
> or sleeping and use that calculated time to give latency/bandwidth
> credit?
> In case of EEVDF, the use of virtual clock seems more
> understandable, if we consider the fact that each client gets 'wi' real
> time units in 1 virtual time unit. That doesnt seem to be the case in
> CFS as Ting Yang explained +/- lags here
> http://lkml.org/lkml/2007/5/2/612 ..

It's not just more understandable, it doesn't break down with
increasing numbers of tasks.


On Mon, May 14, 2007 at 02:03:58PM +0530, Srivatsa Vaddagiri wrote:
> 2. Preemption granularity - sysctl_sched_granularity
> This seems to be measured in the fair clock scale rather than
> wall clock scale. As a consequence of this, the time taken
> for a task to relinquish to competetion is dependent on number N
> of tasks? For ex: if there a million cpu hungry tasks, then the
> time taken to switch between two tasks is more compared to the
> case where just two cpu hungry tasks are running. Is there
> any advantage of using fair clock scale to detect preemtion points?

I'm not convinced this is a great way to mitigate context switch
overhead. I'd recommend heuristically adjusting the latency parameters
(l_i) to try to mitigate context switch overhead or otherwise express
the tradeoff between latency and throughput instead of introducing an
arbitrary delay like sched_granularity_ns. I suspect it'll have to
restart from a point much closer to EEVDF for that to be effective.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/