Re: [patchlet] Re: Scheduler bug related to rq->skip_clock_update?

From: Peter Zijlstra
Date: Tue Dec 07 2010 - 11:41:40 EST


On Mon, 2010-12-06 at 09:32 +0100, Mike Galbraith wrote:

> kernel/fork.c | 1 +
> kernel/sched.c | 6 +++---
> 2 files changed, 4 insertions(+), 3 deletions(-)
>
> Index: linux-2.6.37.git/kernel/sched.c
> ===================================================================
> --- linux-2.6.37.git.orig/kernel/sched.c
> +++ linux-2.6.37.git/kernel/sched.c
> @@ -660,6 +660,7 @@ inline void update_rq_clock(struct rq *r
>
> sched_irq_time_avg_update(rq, irq_time);
> }
> + rq->skip_clock_update = 0;
> }
>
> /*

Shouldn't we do that at the end of schedule()? Since the purpose of
->skip_clock_update is to avoid multiple calls to:
- avoid overhead
- ensure scheduling is accounted at a single point

[ for that latter purpose it might also make sense to put that point
somewhere around context_switch() but due to the fact that we need a
clock update early that's a bit impractical. ]

Hmm?

> @@ -2138,7 +2139,7 @@ static void check_preempt_curr(struct rq
> * A queue event has occurred, and we're going to schedule. In
> * this case, we can save a useless back to back clock update.
> */
> - if (test_tsk_need_resched(rq->curr))
> + if (rq->curr->se.on_rq && test_tsk_need_resched(rq->curr))
> rq->skip_clock_update = 1;
> }

OK, I initially tried to replace the test with a return value of
->check_preempt_curr() and such, but that turns into a lot of code and
won't necessarily be any better.

> @@ -3854,7 +3855,6 @@ static void put_prev_task(struct rq *rq,
> {
> if (prev->se.on_rq)
> update_rq_clock(rq);
> - rq->skip_clock_update = 0;
> prev->sched_class->put_prev_task(rq, prev);
> }

See the first note.

> @@ -3912,7 +3912,6 @@ need_resched_nonpreemptible:
> hrtick_clear(rq);
>
> raw_spin_lock_irq(&rq->lock);
> - clear_tsk_need_resched(prev);
>
> switch_count = &prev->nivcsw;
> if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
> @@ -3942,6 +3941,7 @@ need_resched_nonpreemptible:
> if (unlikely(!rq->nr_running))
> idle_balance(cpu, rq);
>
> + clear_tsk_need_resched(prev);
> put_prev_task(rq, prev);
> next = pick_next_task(rq);

Good find, this needs to be done after the idle balancing because that
can release the rq->lock and allow for TIF_NEED_RESCHED to be set again.

Maybe complement this with a WARN_ON_ONCE(test_tsk_need_resched(next))
somewhere after pick_next_task() so as to ensure that !current has !
TIF_NEED_RESCHED.

> Index: linux-2.6.37.git/kernel/fork.c
> ===================================================================
> --- linux-2.6.37.git.orig/kernel/fork.c
> +++ linux-2.6.37.git/kernel/fork.c
> @@ -275,6 +275,7 @@ static struct task_struct *dup_task_stru
>
> setup_thread_stack(tsk, orig);
> clear_user_return_notifier(tsk);
> + clear_tsk_need_resched(tsk);
> stackend = end_of_stack(tsk);
> *stackend = STACK_END_MAGIC; /* for overflow detection */
>

OK.. have we looked if there's more TIF flags that could do with a
reset?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/