Re: [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu()

From: Peter Zijlstra
Date: Tue Jun 07 2011 - 05:32:05 EST


On Mon, 2011-06-06 at 18:46 +0200, Oleg Nesterov wrote:
> On 06/06, Peter Zijlstra wrote:
> >
> > You're right, p->pi_lock for wakeups, rq->lock for runnable tasks.
>
> Good, thanks.
>
> Help! I have another question.
>
> try_to_wake_up:
>
> raw_spin_lock_irqsave(&p->pi_lock, flags);
> if (!(p->state & state))
> goto out;
>
> cpu = task_cpu(p);
>
> if (p->on_rq && ttwu_remote(p, wake_flags))
> goto stat;
>
> This doesn't look a bit confusing, we can't trust "cpu = task_cpu" before
> we check ->on_rq. OK, not a problem, this cpu number can only be used in
> ttwu_stat(cpu).
>
> But ttwu_stat(cpu) in turn does
>
> if (cpu != task_cpu(p))
> schedstat_inc(p, se.statistics.nr_wakeups_migrate);
>
> Ignoring the theoretical races with pull_task/etc, how it is possible
> that cpu != task_cpu(p) ? Another caller is try_to_wake_up_local(), it
> obviously can't trigger this case.
>
> This looks broken to me. Looking at its name, I guess nr_wakeups_migrate
> should be incremented if ttwu does set_task_cpu(), correct?
>
> IOW. Don't we need something like the (untested/ucompiled) patch below?
> _If_ I am right, I can resend it with the changelog/etc but please feel
> free to make another fix.

You're right, I spotted the same a few days ago which resulted in:

---
commit f339b9dc1f03591761d5d930800db24bc0eda1e1
Author: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Date: Tue May 31 10:49:20 2011 +0200

sched: Fix schedstat.nr_wakeups_migrate

While looking over the code I found that with the ttwu rework the
nr_wakeups_migrate test broke since we now switch cpus prior to
calling ttwu_stat(), hence the test is always true.

Cure this by passing the migration state in wake_flags. Also move the
whole test under CONFIG_SMP, its hard to migrate tasks on UP :-)

Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Link: http://lkml.kernel.org/n/tip-pwwxl7gdqs5676f1d4cx6pj7@xxxxxxxxxxxxxx
Signed-off-by: Ingo Molnar <mingo@xxxxxxx>

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8da84b7..483c1ed 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1063,6 +1063,7 @@ struct sched_domain;
*/
#define WF_SYNC 0x01 /* waker goes to sleep after wakup */
#define WF_FORK 0x02 /* child wakeup after fork */
+#define WF_MIGRATED 0x04 /* internal use, task got migrated */

#define ENQUEUE_WAKEUP 1
#define ENQUEUE_HEAD 2
diff --git a/kernel/sched.c b/kernel/sched.c
index 49cc70b..2fe98ed 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2447,6 +2447,10 @@ ttwu_stat(struct task_struct *p, int cpu, int wake_flags)
}
rcu_read_unlock();
}
+
+ if (wake_flags & WF_MIGRATED)
+ schedstat_inc(p, se.statistics.nr_wakeups_migrate);
+
#endif /* CONFIG_SMP */

schedstat_inc(rq, ttwu_count);
@@ -2455,9 +2459,6 @@ ttwu_stat(struct task_struct *p, int cpu, int wake_flags)
if (wake_flags & WF_SYNC)
schedstat_inc(p, se.statistics.nr_wakeups_sync);

- if (cpu != task_cpu(p))
- schedstat_inc(p, se.statistics.nr_wakeups_migrate);
-
#endif /* CONFIG_SCHEDSTATS */
}

@@ -2675,8 +2676,10 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
p->sched_class->task_waking(p);

cpu = select_task_rq(p, SD_BALANCE_WAKE, wake_flags);
- if (task_cpu(p) != cpu)
+ if (task_cpu(p) != cpu) {
+ wake_flags |= WF_MIGRATED;
set_task_cpu(p, cpu);
+ }
#endif /* CONFIG_SMP */

ttwu_queue(p, cpu);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/