Re: BUG_ON in rcu_sync_func triggered
From: Nikolay Borisov
Date: Tue Sep 13 2016 - 10:38:19 EST
On 09/13/2016 05:35 PM, Nikolay Borisov wrote:
>
>
> On 09/13/2016 04:43 PM, Oleg Nesterov wrote:
>> On 09/13, Oleg Nesterov wrote:
>>>
>>> OK... perhaps the unbalanced up_write... I'll try to look at freeze/thaw code,
>>
>> Heh, yes, it looks racy or I am totally confused.
>>
>>> could test the debugging patch below meanwhile?
>>
>> Yes please. I'll send you another patch (hopefully fix) later, but it
>> would be nice if you can test this patch to get more info.
>
> I've already started testing with this patch on 4.4.20 this time to see
> what happens, but I'll likely get results tomorrow. For now I wasn't
> able to crash it.
Actually forget that, here is a warning that this triggered:
[ 844.284959] ------------[ cut here ]------------
[ 844.290454] WARNING: CPU: 2 PID: 1900 at kernel/rcu/sync.c:160 rcu_sync_func+0xc8/0x150()
[ 844.300154] Modules linked in: xt_state act_police cls_basic sch_ingress veth rbd libceph openvswitch nf_defrag_ipv6 nf_nat_ftp nf_conntrack_ftp xt_owner iptable_mangle xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_CT iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ip6table_filter ip6_tables rdma_ucm ib_ucm ib_uverbs rdma_cm iw_cm dm_mirror dm_region_hash dm_log ib_umad ib_ipoib ib_cm ib_sa ib_mad ib_core ib_addr ipv6 x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32_pclmul ixgbe mdio ipmi_devintf ipmi_si ipmi_msghandler igb i2c_algo_bit sb_edac edac_core i2c_i801 lpc_ich mfd_core ioatdma dca shpchp dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio
[ 844.375006] CPU: 2 PID: 1900 Comm: fio Not tainted 4.4.20-clouder1 #9
[ 844.382524] Hardware name: Supermicro X9DRW/X9DRW, BIOS 1.0b 10/11/2012
[ 844.390241] 0000000000000000 ffff880277d03d78 ffffffff81307a9b 000000000000076c
[ 844.399416] 0000000000000000 0000000000000000 00000000000000a0 ffff880277d03db8
[ 844.408598] ffffffff81054a85 ffff880277d03dc8 ffff88047527daa0 ffff88047527da78
[ 844.417771] Call Trace:
[ 844.420822] <IRQ> [<ffffffff81307a9b>] dump_stack+0x6b/0xa0
[ 844.427659] [<ffffffff81054a85>] warn_slowpath_common+0x95/0xe0
[ 844.434695] [<ffffffff81054aea>] warn_slowpath_null+0x1a/0x20
[ 844.441532] [<ffffffff810ab788>] rcu_sync_func+0xc8/0x150
[ 844.447983] [<ffffffff810b0620>] rcu_process_callbacks+0x290/0x740
[ 844.455310] [<ffffffff810bbc52>] ? ktime_get+0x52/0xc0
[ 844.461459] [<ffffffff810590f3>] __do_softirq+0x113/0x330
[ 844.467909] [<ffffffff810593e5>] irq_exit+0x75/0x80
[ 844.473775] [<ffffffff8163ea16>] smp_apic_timer_interrupt+0x46/0x55
[ 844.481200] [<ffffffff8163d069>] apic_timer_interrupt+0x89/0x90
[ 844.488234] <EOI> [<ffffffff811477b0>] ? shrink_inactive_list+0x1e0/0x5c0
[ 844.496426] [<ffffffff811477a8>] ? shrink_inactive_list+0x1d8/0x5c0
[ 844.503848] [<ffffffff8113c468>] ? global_dirty_limits+0x98/0xc0
[ 844.510984] [<ffffffff8113c909>] ? throttle_vm_writeout+0x39/0xc0
[ 844.518214] [<ffffffff811481c9>] shrink_lruvec+0x289/0x390
[ 844.524754] [<ffffffff8119a6f9>] ? mem_cgroup_iter+0x2a9/0x3e0
[ 844.531687] [<ffffffff811ce98c>] ? wb_queue_work+0x8c/0x100
[ 844.538333] [<ffffffff811483fa>] shrink_zone+0x12a/0x360
[ 844.544686] [<ffffffff8119e9b8>] ? vmpressure+0x88/0x90
[ 844.550943] [<ffffffff811489ad>] do_try_to_free_pages+0x17d/0x450
[ 844.558174] [<ffffffff81199451>] ? mem_cgroup_select_victim_node+0x1d1/0x1f0
[ 844.566468] [<ffffffff81148d35>] try_to_free_mem_cgroup_pages+0xb5/0x190
[ 844.574375] [<ffffffff8119d9dd>] try_charge+0x22d/0x720
[ 844.580631] [<ffffffff8113025e>] ? find_get_entry+0x3e/0xd0
[ 844.587281] [<ffffffff8107b0b2>] ? __might_sleep+0x52/0x90
[ 844.593827] [<ffffffff8130c443>] ? radix_tree_lookup_slot+0x13/0x30
[ 844.601251] [<ffffffff8119e637>] mem_cgroup_try_charge+0x57/0x150
[ 844.608478] [<ffffffff81131b2c>] __add_to_page_cache_locked+0x4c/0x270
[ 844.616194] [<ffffffff811db990>] ? __block_commit_write+0x80/0xb0
[ 844.623419] [<ffffffff81131d78>] add_to_page_cache_lru+0x28/0x80
[ 844.630548] [<ffffffff81131e67>] pagecache_get_page+0x97/0x1e0
[ 844.637484] [<ffffffff81131fdb>] grab_cache_page_write_begin+0x2b/0x50
[ 844.645202] [<ffffffff8123ff2d>] ext4_da_write_begin+0x17d/0x330
[ 844.652334] [<ffffffff8123c716>] ? ext4_dirty_inode+0x66/0x80
[ 844.659167] [<ffffffff8112ff80>] generic_perform_write+0xd0/0x1f0
[ 844.666385] [<ffffffff81132916>] __generic_file_write_iter+0x196/0x1f0
[ 844.674102] [<ffffffff8107b0b2>] ? __might_sleep+0x52/0x90
[ 844.680648] [<ffffffff81233b0f>] ext4_file_write_iter+0x11f/0x3a0
[ 844.687874] [<ffffffff8107b0b2>] ? __might_sleep+0x52/0x90
[ 844.694418] [<ffffffff812339f0>] ? ext4_unwritten_wait+0xc0/0xc0
[ 844.701547] [<ffffffff811f1a1e>] aio_run_iocb+0x1ee/0x290
[ 844.707999] [<ffffffff8107b0b2>] ? __might_sleep+0x52/0x90
[ 844.714537] [<ffffffff811f1de1>] do_io_submit+0x321/0x530
[ 844.720989] [<ffffffff811f1388>] ? SyS_io_getevents+0x58/0xc0
[ 844.727828] [<ffffffff81002017>] ? trace_hardirqs_on_thunk+0x17/0x19
[ 844.735345] [<ffffffff811f2000>] SyS_io_submit+0x10/0x20
[ 844.741701] [<ffffffff8163c357>] entry_SYSCALL_64_fastpath+0x12/0x6a
[ 844.749230] ---[ end trace 5f72aeec215954f4 ]---
[ 844.754708] XXX: ffff88047527da78 gp=2 cnt=0 cb=1