Re: [PATCH v3 5/6] sched/deadline/rtmutex: Fix unprotected PI access in enqueue_task_dl()

From: Xunlei Pang
Date: Thu Apr 14 2016 - 22:19:21 EST


On 2016/04/15 at 09:58, Xunlei Pang wrote:
> On 2016/04/14 at 23:31, Peter Zijlstra wrote:
>> On Thu, Apr 14, 2016 at 07:37:06PM +0800, Xunlei Pang wrote:
>>> We access @pi_task's data without any lock in enqueue_task_dl(), though
>>> checked "dl_prio(pi_task->normal_prio)" condition, that's not enough.
>> The proper fix is to ensure that pi_task is guaranteed to be blocked.
> Even if pi_task was blocked, its parameters are still allowed to be changed,
> so we have to do that. Did I miss something?
>
> Regards,
> Xunlei

Fortunately, I just reproduced through an overnight test, so it really happened in reality as I thought.

[50697.042391] kernel BUG at kernel/sched/deadline.c:398!
[50697.048212] invalid opcode: 0000 [#1] SMP
[50697.137676] CPU: 1 PID: 10676 Comm: bugon Tainted: G W 4.6.0-rc3+ #19
[50697.146250] Hardware name: Intel Corporation Broadwell Client platform/SawTooth Peak, BIOS BDW-E1R1.86C.0127.R00.150
8062034 08/06/2015
[50697.159942] task: ffff880089d72b80 ti: ffff880074bb4000 task.ti: ffff880074bb4000
[50697.168420] RIP: 0010:[<ffffffff810cb4ef>] [<ffffffff810cb4ef>] replenish_dl_entity+0xff/0x110
[50697.178292] RSP: 0000:ffff88016ec43d90 EFLAGS: 00010046
[50697.184307] RAX: 0000000000000001 RBX: ffff880089d72d50 RCX: 0000000000000001
[50697.192390] RDX: 0000000000000010 RSI: ffff8800719858d0 RDI: ffff880089d72d50
[50697.200473] RBP: ffff88016ec43da8 R08: 0000000000000001 R09: 0000000000000097
[50697.208556] R10: 0000000057102e72 R11: 000000000f9e6fd7 R12: ffff88016ec56e40
[50697.216638] R13: ffff88016ec56e40 R14: 0000000000016e40 R15: ffff880089d72d50
[50697.224721] FS: 00007f14e788b700(0000) GS:ffff88016ec40000(0000) knlGS:0000000000000000
[50697.233887] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[50697.240396] CR2: 000055be08240c68 CR3: 000000008a5d5000 CR4: 00000000003406e0
[50697.248478] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[50697.256561] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[50697.264643] Stack:
[50697.266917] ffff88016ec56e40 ffff880089d72b80 0000000000000010 ffff88016ec43de8
[50697.275338] ffffffff810cbfe4 ffff88016ec43de8 ffff880089d72b80 ffff88016ec56e40
[50697.283757] 00000000000188c5 ffff880089d72d50 ffff88016ec4f228 ffff88016ec43e18
[50697.292175] Call Trace:
[50697.294943] <IRQ>
[50697.297122] [<ffffffff810cbfe4>] enqueue_task_dl+0x264/0x340
[50697.303838] [<ffffffff810cc453>] update_curr_dl+0x1c3/0x1f0
[50697.310249] [<ffffffff810cc51c>] task_tick_dl+0x1c/0x80
[50697.316265] [<ffffffff810b66ac>] scheduler_tick+0x5c/0xe0
[50697.322480] [<ffffffff811060d0>] ? tick_sched_do_timer+0x50/0x50
[50697.329383] [<ffffffff810f60e1>] update_process_times+0x51/0x60
[50697.336188] [<ffffffff81105a25>] tick_sched_handle.isra.17+0x25/0x60
[50697.343486] [<ffffffff8110610d>] tick_sched_timer+0x3d/0x70
[50697.349895] [<ffffffff810f6c93>] __hrtimer_run_queues+0xf3/0x270
[50697.356797] [<ffffffff810f7168>] hrtimer_interrupt+0xa8/0x1a0
[50697.363404] [<ffffffff81053de5>] local_apic_timer_interrupt+0x35/0x60
[50697.370799] [<ffffffff816b7a1d>] smp_apic_timer_interrupt+0x3d/0x50
[50697.377996] [<ffffffff816b5b5c>] apic_timer_interrupt+0x8c/0xa0
[50697.384798] <EOI>
[50697.384798] <EOI>
[50697.386974] Code: a9 48 c7 c7 38 5f a0 81 31 c0 48 89 75 e8 c6 05 5c 48 c8 00 01 e8 74 20 0c 00 49 8b 84 24 28 09 00 00 8b 4b 54 48 8b 75 e8 eb c4 <0f> 0b 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
[50697.409201] RIP [<ffffffff810cb4ef>] replenish_dl_entity+0xff/0x110
[50697.416409] RSP <ffff88016ec43d90>
[50697.433683] ---[ end trace da6e1e42babefb7f ]---
[50697.438913] Kernel panic - not syncing: Fatal exception in interrupt
[50698.484088] Shutting down cpus with NMI
[50698.488434] Kernel Offset: disabled
[50698.492383] ---[ end Kernel panic - not syncing: Fatal exception in interrupt