Re: sched_setscheduler() vs idle_balance() race

From: Mike Galbraith
Date: Sat May 30 2015 - 09:08:53 EST


On Thu, 2015-05-28 at 15:53 +0200, Peter Zijlstra wrote:
> On Thu, May 28, 2015 at 09:43:52AM +0200, Mike Galbraith wrote:
> > Hi Peter,
> >
> > I'm not seeing what prevents pull_task() from yanking a task out from
> > under __sched_setscheduler(). A box sprinkling smoldering 3.0 kernel
> > wreckage all over my bugzilla mbox isn't seeing it either ;-)
>
> Say, how easy can that thing be reproduced?
>
> The below is compile tested only, but it might just work if I didn't
> miss anything :-)

Seems trying to make the target invisible to balancing created a new
race: dequeue target, do stuff that may drop rq->lock while it's
dequeued, target sneaks into schedule(), dequeues itself (#2), boom.

On my desktop box I meet..

crash> bt
PID: 6281 TASK: ffff880401950000 CPU: 5 COMMAND: "massive_intr_x"
#0 [ffff8800da9d79c0] machine_kexec at ffffffff8103c428
#1 [ffff8800da9d7a20] crash_kexec at ffffffff810c98e5
#2 [ffff8800da9d7af0] oops_end at ffffffff81006418
#3 [ffff8800da9d7b20] no_context at ffffffff815b4296
#4 [ffff8800da9d7b80] __bad_area_nosemaphore at ffffffff815b4353
#5 [ffff8800da9d7bd0] bad_area at ffffffff815b4691
#6 [ffff8800da9d7c00] __do_page_fault at ffffffff81044eba
#7 [ffff8800da9d7c70] do_page_fault at ffffffff8104500c
#8 [ffff8800da9d7c80] page_fault at ffffffff815c10b2
[exception RIP: set_next_entity+28]
RIP: ffffffff81080bac RSP: ffff8800da9d7d38 RFLAGS: 00010092
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000044aa200
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88041ed55968
RBP: ffff8800da9d7d78 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: ffff88041ed55968
R13: 0000000000015900 R14: 0000000000000005 R15: ffff88041ed55900
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff8800da9d7d80] pick_next_task_fair at ffffffff81084071
#10 [ffff8800da9d7df0] __schedule at ffffffff815bb507
#11 [ffff8800da9d7e40] schedule at ffffffff815bbcb7
#12 [ffff8800da9d7e60] do_nanosleep at ffffffff815be615
#13 [ffff8800da9d7ea0] hrtimer_nanosleep at ffffffff810acc96
#14 [ffff8800da9d7f20] sys_nanosleep at ffffffff810acdb6
#15 [ffff8800da9d7f50] system_call_fastpath at ffffffff815bf61b
RIP: 00007fbde0eb3130 RSP: 00007ffcb4a0a7e8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: ffff8800da9d7f90 RCX: 00007fbde0eb3130
RDX: 0000000055695905 RSI: 0000000000000000 RDI: 00007ffcb4a0a800
RBP: 00007ffcb4a0a7e0 R8: 0000000000000000 R9: 00007ffcb4a0a740
R10: 00007ffcb4a0a5b0 R11: 0000000000000246 R12: 0000000000000010
R13: ffffffff815bf5f2 R14: ffffffffffffff10 R15: 00007ffcb4a0a800
ORIG_RAX: 0000000000000023 CS: 0033 SS: 002b
crash> struct -x rq ffff88041ed55900
struct rq {
lock = {
raw_lock = {
{
head_tail = 0xb0ae,
tickets = {
head = 0xae,
tail = 0xb0
}
}
}
},
nr_running = 0xffffffff,
cpu_load = {0x3ff, 0x3ea, 0x35a, 0x29b, 0x225},
last_load_update_tick = 0xffffa827,
nohz_stamp = 0x0,
nohz_flags = 0x0,
load = {
weight = 0xfffffffffffffc00,
inv_weight = 0x0
},
nr_load_updates = 0xb644,
nr_switches = 0x5b5b0,
cfs = {
load = {
weight = 0xfffffffffffffc00,
inv_weight = 0x0
},
nr_running = 0xffffffff,
h_nr_running = 0xffffffff,
exec_clock = 0xc9070a1c3,
min_vruntime = 0x70cee14b9,
tasks_timeline = {
rb_node = 0x0
},
rb_leftmost = 0x0,
curr = 0x0,
next = 0x0,
last = 0x0,
skip = 0x0,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/