Re: [patch] fix SMT scheduler latency bug

From: Con Kolivas
Date: Wed Jun 22 2005 - 09:47:47 EST


Hi

On Wed, 22 Jun 2005 20:25, Ingo Molnar wrote:
> William Weston reported unusually high scheduling latencies on his x86
> HT box, on the -RT kernel. I managed to reproduce it on my HT box and
> the latency tracer shows the incident in action:

Thanks for picking this up. I've had a long hard look at the code and your
patch.

> the reason for this anomaly is the following code in dependent_sleeper():
>
> /*
> * If a user task with lower static priority than the
> * running task on the SMT sibling is trying to schedule,
> * delay it till there is proportionately less timeslice
> * left of the sibling task to prevent a lower priority
> * task from using an unfair proportion of the
> * physical cpu's resources. -ck
> */
> [...]
> if (((smt_curr->time_slice * (100 -
> sd->per_cpu_gain) / 100) > task_timeslice(p)))
> ret = 1;
>
> note that in contrast to the comment above, we dont actually do the
> check based on static priority, we do the check based on timeslices. But
> timeslices go up and down, and even highprio tasks can randomly have
> very low timeslices (just before their next refill) and can thus be
> judged as 'lowprio' by the above piece of code.

I don't see it like that. task_timeslice(p) will always return the same value
based purely on static priority and smt_curr->time_slice cannot ever be
larger than task_timeslice(p) unless there is a significant enough 'nice'
difference. It is not smt_curr that is rescheduled as a result of this test,
it is p that is not scheduled and we look at p's task_timeslice which does
not alter. The task that is delayed in either case is dependant on its static
priority which will determine its task_timeslice() vs the current value of
->time_slice on the sibling which is emptied as that task runs, and it is
expected to fluctuate.

> This condition is
> clearly buggy. The correct test is to check for static_prio _and_ to
> check for the preemption priority. Even on different static priority
> levels, a higher-prio interactive task should not be delayed due to a
> higher-static-prio CPU hog.

> - if (((smt_curr->time_slice * (100 - sd->per_cpu_gain) /
> - 100) > task_timeslice(p)))
> + if (smt_curr->static_prio < p->static_prio &&
> + !TASK_PREEMPTS_CURR(p, smt_rq) &&
> + smt_slice(smt_curr, sd) > task_timeslice(p))

Checking for smt_curr->static_prio < p->static_prio appears redundant to me
because the condition can only be met if there is a significant difference in
the different timeslice case as I mentioned above.

> + if (TASK_PREEMPTS_CURR(p, smt_rq) &&

Is this check necessary? The proportion is supposed to be distributed
according to static priority only.

If this code is causing large latencies then I believe it can only occur with
different nice levels running on siblings and high priority tasks starting
new timeslices repeatedly and never getting to the last per_cpu_gain% of
their timeslice. Ingo do you think this might be what is being seen? If this
truly can happen then this code will have to move to a jiffy based proportion
as the real time code is to prevent this problem.

Cheers,
Con

Attachment: pgp00000.pgp
Description: PGP signature