Re: [patch] timer: Fix timers_update_migration(), and call it in tmigr_init()

From: Paul E. McKenney
Date: Sun Apr 30 2017 - 18:49:46 EST


On Sat, Apr 29, 2017 at 09:36:37PM -0700, Paul E. McKenney wrote:
> On Sun, Apr 30, 2017 at 06:20:15AM +0200, Mike Galbraith wrote:
> > On Sat, 2017-04-29 at 20:43 -0700, Paul E. McKenney wrote:
> > > On Sun, Apr 30, 2017 at 03:21:58AM +0200, Mike Galbraith wrote:
> > > > On Sat, 2017-04-29 at 14:45 -0700, Paul E. McKenney wrote:
> > > > > On Sat, Apr 29, 2017 at 08:20:33PM +0200, Mike Galbraith wrote:
> > > > > > On Sat, 2017-04-29 at 11:06 -0700, Paul E. McKenney wrote:
> > > > > >
> > > > > > > If someone will either repost a fresh series or point me at exactly
> > > > > > > the set of patches to use, I will run it through rcutorture again.
> > > > > >
> > > > > > Patchlet is against x86-tip/master.today.
> > > > >
> > > > > So today's (as in Saturday April 29) x86-tip/master with the following
> > > > > patch applied?
> > > >
> > > > Yeah.
> > >
> > > OK, will fire it up once the current set of overnight tests complete.
> >
> > I certainly don't want to discourage you from beating hell outta tip,
> > just want to make sure you know that I'm seeing zero RCU woes, only
> > late timer expiry (sharpening rocks/sticks to focus trace).
>
> I got timer_migration splats from an earlier rcutorture run. Please see
> message-ID <20170421192853.GD3956@xxxxxxxxxxxxxxxxxx> on LKML on April
> 21st in reply to Thomas's V2 00/10 cover letter. So I am curious to
> learn if your patches fix them.

And sadly, the splats are still there. Please see the following for
the relevant console output and .config files:

http://www2.rdrop.com/users/paulmck/submission/TREE04.2017.04.30a.config
http://www2.rdrop.com/users/paulmck/submission/TREE04.2017.04.30a.console.log
http://www2.rdrop.com/users/paulmck/submission/TREE04.3.2017.04.30a.console.log

http://www2.rdrop.com/users/paulmck/submission/TREE07.2017.04.30a.config
http://www2.rdrop.com/users/paulmck/submission/TREE07.2.2017.04.30a.bzImage
http://www2.rdrop.com/users/paulmck/submission/TREE07.2017.04.30a.console.log
http://www2.rdrop.com/users/paulmck/submission/TREE07.2.2017.04.30a.console.log

Please let me know if you have any trouble accessing these.

Here is the first splat from the first TREE04 run:

[ 3.310642] WARNING: CPU: 1 PID: 0 at /home/paulmck/public_git/timer-tip/kernel/time/timer_migration.c:387 tmigr_set_cpu_active+0xc6/0xe0
[ 3.313210] Modules linked in:
[ 3.313861] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.11.0-rc8+ #1
[ 3.315196] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 3.317196] task: ffff8c1fde9f0000 task.stack: ffff8e4fc0124000
[ 3.318433] RIP: 0010:tmigr_set_cpu_active+0xc6/0xe0
[ 3.319464] RSP: 0000:ffff8e4fc0127e90 EFLAGS: 00010046
[ 3.320598] RAX: 0000000000000004 RBX: 0000000000000001 RCX: 000000000000001f
[ 3.322146] RDX: 0000000000000001 RSI: ffff8c1fdfc54cc8 RDI: ffff8c1fdeb26f80
[ 3.323652] RBP: ffff8e4fc0127ea8 R08: 0000000000000000 R09: 0000000000000008
[ 3.325237] R10: ffff8e4fc0127e80 R11: 0000000000000400 R12: ffff8c1fdeb26f80
[ 3.326699] R13: ffff8c1fdfc54cc8 R14: 0000000000000000 R15: 0000000000000000
[ 3.328149] FS: 0000000000000000(0000) GS:ffff8c1fdfc40000(0000) knlGS:0000000000000000
[ 3.329845] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.331078] CR2: ffff8e4fc02f0000 CR3: 0000000015e0a000 CR4: 00000000000006e0
[ 3.332509] Call Trace:
[ 3.333107] tmigr_cpu_activate+0x36/0x40
[ 3.333972] tick_nohz_idle_exit+0xd1/0xf0
[ 3.334845] do_idle+0x113/0x170
[ 3.335501] cpu_startup_entry+0x18/0x20
[ 3.336338] start_secondary+0xe8/0xf0
[ 3.337147] secondary_startup_64+0x9f/0x9f
[ 3.337998] Code: d0 48 8b 03 48 85 c0 75 eb eb a0 49 8b 7c 24 50 41 89 5c 24 08 48 85 ff 74 8c 49 8d 74 24 20 89 da e8 3f ff ff ff e9 7b ff ff ff <0f> ff 41 c6 04 24 00 5b 41 5c 41 5d 5d c3 66 90 66 2e 0f 1f 8

This is the first WARN_ON() in tmigr_set_cpu_active(). I got four splats
in 12 hours of running the rcutorture TREE04 test scenario, that is, three
runs of four hours each.

The TREE07 runs fared worse, with many more splats, starting with a
page fault. The scripting claimed a hang, but that looks to have instead
been so many splats that the test failed to terminate itself in time.
I ran two TREE07 runs of four hours each.

Thanx, Paul