Re: [tip:sched/urgent] sched: Fix cross-cpu clock sync on remotewakeups

From: Peter Zijlstra
Date: Thu Jun 02 2011 - 11:45:14 EST


On Thu, 2011-06-02 at 22:23 +0800, Yong Zhang wrote:
> On Thu, Jun 02, 2011 at 03:04:26PM +0200, Peter Zijlstra wrote:
> > On Thu, 2011-06-02 at 15:52 +0800, Yong Zhang wrote:
> > > In sched_clock_local(), clock is calculated around ->tick_gtod even if
> > > that ->tick_gtod is stale for long time because we stays in idle state.
> > > You know ->tick_gtod is only updated in sched_clock_tick();
> >
> > (well, no, there's idle callbacks as you said below)
> >
> > > IOW, when a cpu goes out of idle, sched_clock_tick() is called from
> > > tick_nohz_stop_idle() which is later than interrupt.
> >
> > Gah, that would be awefull and mean wakeups from interrupts were already
> > borken. /me goes look at code.
> >
> > irq_enter() -> tick_check_idle() -> tick_check_nohz() ->
> > tick_nohz_stop_idle() -> sched_clock_idle_wakeup_event()
> >
> > should update the thing before we run any isrs, right?
>
> Hmmm, you are right.
>
> But smp_reschedule_interrupt() doesn't call irq_enter()/irq_exit(),
> is that correct?

Crap.. you're right. And I bet other archs don't do that either. With
NO_HZ you really need irq_enter() for pretty much all interrupts so I
was assuming the resched IPI had it, but its been special and never
really needed it. If it would wake an idle cpu the idle loop exit would
deal with it, if it interrupted userspace the thing was running and
NO_HZ wasn't relevant.

Damn.

And yes, the only reason I didn't see this on my dev box was because we
do indeed set that sched_clock_stable thing on wsm. And I never noticed
on my desktop because firefox/X/etc. consuming heaps of CPU isn't weird
at all.

Adding it to all resched int handlers is of course a possibility but
would slow down the thing, although with the new code, most users are
now indeed wakeups (excepting weird and wonderful users like KVM).

We could of course add it in sched.c since the logic recurses just
fine.. its not pretty though.. :/

Thoughts?

---
kernel/sched.c | 18 +++++++++++++++++-
1 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 2fe98ed..365ed6b 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2554,7 +2554,23 @@ static void sched_ttwu_pending(void)

void scheduler_ipi(void)
{
- sched_ttwu_pending();
+ struct rq *rq = this_rq();
+ struct task_struct *list = xchg(&rq->wake_list, NULL);
+
+ if (!list)
+ return;
+
+ irq_enter();
+ raw_spin_lock(&rq->lock);
+
+ while (list) {
+ struct task_struct *p = list;
+ list = list->wake_entry;
+ ttwu_do_activate(rq, p, 0);
+ }
+
+ raw_spin_unlock(&rq->lock);
+ irq_exit();
}

static void ttwu_queue_remote(struct task_struct *p, int cpu)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/