Re: [GIT PULL] scheduler changes for v5.3

From: Peter Zijlstra
Date: Wed Jul 10 2019 - 06:57:47 EST


On Tue, Jul 09, 2019 at 10:48:49PM -0700, John Stultz wrote:
> On Mon, Jul 8, 2019 at 9:33 AM Ingo Molnar <mingo@xxxxxxxxxx> wrote:
> > Please pull the latest sched-core-for-linus git tree from:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched-core-for-linus
> ....
> > Peter Zijlstra (1):
> > sched/core: Optimize try_to_wake_up() for local wakeups
>
> Hey Peter, Ingo,
> Since this change landed in Linus' tree, I've been seeing a lot of
> the following dmesg noise when running AOSP on the HiKey960 board.
>
> [ 173.162712] CPU: 2 PID: 731 Comm: ndroid.systemui Tainted: G S
> 5.2.0-rc5-00110-g6751c43d94d6 #447
> [ 173.162721] Hardware name: HiKey960 (DT)
> [ 173.171194] caller is try_to_wake_up+0x3e4/0x788
> [ 173.179605] Call trace:
> [ 173.179617] dump_backtrace+0x0/0x140
> [ 173.179626] show_stack+0x14/0x20
> [ 173.179638] dump_stack+0x9c/0xc4
> [ 173.179649] debug_smp_processor_id+0x148/0x150
> [ 173.179659] try_to_wake_up+0x3e4/0x788
> [ 173.179669] wake_up_q+0x5c/0x98
> [ 173.179681] futex_wake+0x170/0x1a8
> [ 173.179696] do_futex+0x560/0xf30
> [ 173.284541] __arm64_sys_futex+0xfc/0x148
> [ 173.288570] el0_svc_common.constprop.0+0x64/0x188
> [ 173.293371] el0_svc_handler+0x28/0x78
> [ 173.297131] el0_svc+0x8/0xc
> [ 173.300045] CPU: 0 PID: 1258 Comm: Binder:363_5 Tainted: G S
> 5.2.0-rc5-00110-g6751c43d94d6 #447
> [ 173.301130] BUG: using smp_processor_id() in preemptible [00000000]
> code: ndroid.systemui/731
> [ 173.310074] Hardware name: HiKey960 (DT)
> [ 173.310084] Call trace:
> [ 173.310112] dump_backtrace+0x0/0x140
> [ 173.310131] show_stack+0x14/0x20
> [ 173.318685] caller is try_to_wake_up+0x3e4/0x788
> [ 173.322583] dump_stack+0x9c/0xc4
> [ 173.322595] debug_smp_processor_id+0x148/0x150
> [ 173.322605] try_to_wake_up+0x3e4/0x788
> [ 173.322615] wake_up_q+0x5c/0x98
> [ 173.322628] futex_wake+0x170/0x1a8
> [ 173.322641] do_futex+0x560/0xf30
> [ 173.358367] __arm64_sys_futex+0xfc/0x148
> [ 173.362397] el0_svc_common.constprop.0+0x64/0x188
> [ 173.367199] el0_svc_handler+0x28/0x78
> [ 173.370956] el0_svc+0x8/0xc
>

Urgh.. however didn't we find that before :/ stupid stats.

Something like the below ought to fix, but let me see if I can come up
with something saner...

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 108449526f11..0b22e55cebe8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2399,6 +2399,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
unsigned long flags;
int cpu, success = 0;

+ preempt_disable();
if (p == current) {
/*
* We're waking current, this means 'p->on_rq' and 'task_cpu(p)
@@ -2412,7 +2413,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
* it disabling IRQs (this allows not taking ->pi_lock).
*/
if (!(p->state & state))
- return false;
+ goto out;

success = 1;
cpu = task_cpu(p);
@@ -2526,6 +2527,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
out:
if (success)
ttwu_stat(p, cpu, wake_flags);
+ preempt_enable();

return success;
}