Re: BUG_ON in rcu_sync_func triggered

From: Nikolay Borisov
Date: Tue Sep 13 2016 - 04:21:20 EST




On 09/12/2016 04:01 PM, Oleg Nesterov wrote:
> Hi Nikolay,
>
[SNIP..]
>>
>>
>> The bug on in question is this: BUG_ON(rsp->gp_state != GP_PASSED);
>>
>> Have you seen something like that before - the kernel is fairly old 4.4.2,
>
> No... thanks, I'll try to look tomorrow.

I just re-run the test with kernel 4.4.14 and PROVE_RCU and DEBUG_RCU_OBJECTS
enabled and here is what I got:

[ 2380.195422] ------------[ cut here ]------------
[ 2380.195770] kernel BUG at kernel/rcu/sync.c:111!
[ 2380.196107] invalid opcode: 0000 [#1] SMP
[ 2380.204519] CPU: 5 PID: 16265 Comm: fsfreeze Not tainted 4.4.14-clouder5-debug #6
[ 2380.205109] Hardware name: Supermicro X9DRW/X9DRW, BIOS 1.0b 10/11/2012
[ 2380.205453] task: ffff880444572d00 ti: ffff880450690000 task.ti: ffff880450690000
[ 2380.206043] RIP: 0010:[<ffffffff810c1c93>] [<ffffffff810c1c93>] rcu_sync_enter+0xf3/0x100
[ 2380.206723] RSP: 0018:ffff880450693db8 EFLAGS: 00010202
[ 2380.207062] RAX: ffff880444572d00 RBX: ffff880445cfe4f8 RCX: 0000000000000006
[ 2380.207406] RDX: 0000000000000006 RSI: ffff8804445734a0 RDI: ffff880444572d00
[ 2380.207750] RBP: ffff880450693df8 R08: 0000000000000004 R09: 0000000000000000
[ 2380.208094] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880445cfe500
[ 2380.208436] R13: 0000000000000001 R14: 0000000000000003 R15: 0000000000000000
[ 2380.214641] FS: 00007faaf3bde700(0000) GS:ffff88047fc80000(0000) knlGS:0000000000000000
[ 2380.215235] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2380.215577] CR2: 00007f3f67f4d038 CR3: 000000045722d000 CR4: 00000000000406e0
[ 2380.215924] Stack:
[ 2380.216259] 0000000000000000 dead000000000100 dead000000000200 ffff880444572d00
[ 2380.217179] 0000000000000000 ffff880445cfe4f8 ffff880445cfe640 0000000000000000
[ 2380.218105] ffff880450693e48 ffffffff810a9ded ffff880450693e68 ffffffff81351093
[ 2380.219026] Call Trace:
[ 2380.219366] [<ffffffff810a9ded>] percpu_down_write+0x1d/0xf0
[ 2380.219715] [<ffffffff81351093>] ? call_rwsem_down_write_failed+0x13/0x20
[ 2380.220062] [<ffffffff811cc5ef>] sb_wait_write+0x2f/0x60
[ 2380.220402] [<ffffffff811cc817>] freeze_super+0x87/0x140
[ 2380.220744] [<ffffffff811deb08>] do_vfs_ioctl+0x428/0x570
[ 2380.221085] [<ffffffff811cf1e5>] ? SYSC_newfstat+0x25/0x30
[ 2380.221427] [<ffffffff811ea9e6>] ? __fget_light+0x66/0x90
[ 2380.221767] [<ffffffff811decc9>] SyS_ioctl+0x79/0x90
[ 2380.222111] [<ffffffff8169851b>] entry_SYSCALL_64_fastpath+0x16/0x73
[ 2380.222454] Code: 75 c0 ba 02 00 00 00 4c 89 e7 e8 59 0f fe ff 83 3b 02 75 e5 48 8d 75 c0 4c 89 e7 e8 18 0c fe ff 48 83 c4 28 5b 41 5c 41 5d 5d c3 <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89
[ 2380.228692] RIP [<ffffffff810c1c93>] rcu_sync_enter+0xf3/0x100
[ 2380.229114] RSP <ffff880450693db8>

The bug on is this one: BUG_ON(need_wait && need_sync);

Interestingly, there wasn't any complaint from prove_rcu_locking.

Regards,
Nikolay