BUG: Scheduler crashes with 4.19 kernels

From: Huttunen, Janne (Nokia - FI/Espoo)
Date: Thu Jan 30 2020 - 03:09:38 EST



We have experienced somewhat reproducible scheduler crashes
and BUGs with a 4.19 kernel. The workload is some kind of
container related stress-test i.e. it creates and destroys
containers in a rapid rate. Currently I don't have any more
details about the specifics of the test case.

Within a few hours of testing, the scheduler code in the
kernel either triggers an assertion or tries to dereference
a completely invalid pointer and crashes there.

An example of a BUG triggering:

[136833.901205] list_add corruption. next->prev should be prev (ffff8e8d4406dd40), but was ffff8eb17fce2c68. (next=ffff8e8d4642d940).
[136833.913519] ------------[ cut here ]------------
[136833.918550] kernel BUG at lib/list_debug.c:25!
[136833.923405] invalid opcode: 0000 [#1] SMP PTI
[136833.928181] CPU: 1 PID: 3330720 Comm: entrypoint.sh Kdump: loaded Not tainted 4.19.93-1.el7.centos.ncir.x86_64 #1
[136833.939098] Hardware name: Nokia Solutions and Networks AE-SER1U-A/AF1801.02/AE-SER1U-A/AF1801.02, BIOS 3B17 10/09/2019
[136833.950532] RIP: 0010:__list_add_valid+0x3a/0x70
[136833.955563] Code: 48 8b 32 48 39 f0 75 3a 48 39 c7 74 1f 48 39 fa 74 1a b8 01 00 00 00 5d c3 48 89 c1 48 c7 c7 f8 87 ab bb 31 c0 e8 c4 37 ce ff <0f> 0b 48 89 c1 48 89 fe 31 c0 48 c7 c7 a8 88 ab bb e8 ae 37 ce ff
[136833.975232] RSP: 0018:ffffa9846bcdfd88 EFLAGS: 00010086
[136833.980872] RAX: 0000000000000075 RBX: ffff8e8d44126c00 RCX: 0000000000000000
[136833.988654] RDX: 0000000000000001 RSI: ffff8eb17f456a98 RDI: ffff8eb17f456a98
[136833.996432] RBP: ffffa9846bcdfd88 R08: 0000000000000000 R09: 0000000000002e76
[136834.004216] R10: ffff8e8247c15000 R11: 0000000000000000 R12: ffff8eb17fce2300
[136834.012000] R13: ffff8e8d44126d40 R14: ffff8e8d4406dd40 R15: ffff8e8d4642d940
[136834.019785] FS: 00007f0066569740(0000) GS:ffff8eb17f440000(0000) knlGS:0000000000000000
[136834.028519] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[136834.034678] CR2: 00007f00666d2c50 CR3: 0000000bf1164005 CR4: 00000000007606e0
[136834.042463] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[136834.050243] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[136834.058026] PKRU: 55555554
[136834.061146] Call Trace:
[136834.064005] enqueue_entity+0x1ee/0x360
[136834.068253] ? propagate_entity_cfs_rq.isra.77+0x34f/0x570
[136834.074152] enqueue_task_fair+0x91/0x730
[136834.078574] ? attach_entity_cfs_rq+0x315/0x5e0
[136834.083518] activate_task+0x54/0xb0
[136834.087503] wake_up_new_task+0x14d/0x290
[136834.091927] _do_fork+0x145/0x360
[136834.095656] __x64_sys_clone+0x27/0x30
[136834.099818] do_syscall_64+0x65/0x1b0
[136834.103891] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[136834.109356] RIP: 0033:0x7f0066634c8f
[136834.113348] Code: ed 0f 85 f4 00 00 00 64 4c 8b 0c 25 10 00 00 00 45 31 c0 4d 8d 91 d0 02 00 00 31 d2 31 f6 bf 11 00 20 01 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 8d 00 00 00 41 89 c5 85 c0 0f 85 9a 00 00
[136834.133017] RSP: 002b:00007ffc1789b130 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[136834.141236] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f0066634c8f
[136834.149022] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[136834.156800] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f0066569740
[136834.164582] R10: 00007f0066569a10 R11: 0000000000000246 R12: 0000000000000000
[136834.172369] R13: 00007ffc1789b1f0 R14: 0000556ca04d8280 R15: 0000556ca02bfde0
[136834.180148] Modules linked in: rbd libceph dns_resolver nbd binfmt_misc tcp_diag inet_diag ebtable_filter ebtables ip6table_filter ip6_tables veth ipt_REJECT nf_reject_ipv4 vxlan ip6_udp_tunnel udp_tunnel xt_statistic xt_nat xt_comment xt_mark xt_conntrack ipt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_tables bpfilter overlay fuse btrfs xor zstd_compress raid6_pq zstd_decompress xxhash msdos 8021q garp mrp iTCO_wdt iTCO_vendor_support intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp vfat fat ext4 kvm_intel mbcache jbd2 kvm intel_cstate pcspkr ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm i2c_i801 joydev cdc_ether usbnet mei_me sg mii mei lpc_ich
[136834.253972] wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi ib_srpt target_core_mod pcc_cpufreq ib_srp scsi_transport_srp ioatdma ib_ipoib rdma_ucm ib_umad rdma_cm ib_cm iw_cm br_netfilter bridge stp llc vfio_pci vfio_virqfd vfio_iommu_type1 vfio irqbypass sctp xfs libcrc32c mlx5_ib ib_uverbs ib_core sr_mod cdrom sd_mod scsi_transport_iscsi crct10dif_pclmul crc32_pclmul crc32c_intel igb ghash_clmulni_intel i2c_algo_bit mlx5_core pcbc dca mlxfw ahci devlink aesni_intel i2c_core libahci crypto_simd cryptd ptp glue_helper libata pps_core uas usb_storage sunrpc dm_mirror dm_region_hash dm_log dm_mod

An example of invalid pointer (from a different machine):

[136509.496283] general protection fault: 0000 [#1] SMP PTI
[136509.501931] CPU: 34 PID: 3218165 Comm: stress-ng Kdump: loaded Not tainted 4.19.93-1.el7.centos.ncir.x86_64 #1
[136509.512585] Hardware name: Nokia Solutions and Networks AE-SER1U-A/AF1801.02/AE-SER1U-A/AF1801.02, BIOS 3B17 10/09/2019
[136509.524040] RIP: 0010:__list_add_valid+0x4/0x70
[136509.528989] Code: ff 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 83 c4 38 b8 f4 ff ff ff 5b 41 5c 41 5d 41 5e 41 5f 5d c3 90 90 90 90 90 90 55 48 89 d0 <48> 8b 52 08 48 89 e5 48 39 f2 75 19 48 8b 32 48 39 f0 75 3a 48 39
[136509.548662] RSP: 0000:ffffa015ffa83da8 EFLAGS: 00010046
[136509.554299] RAX: 12df5b915e214939 RBX: ffff9ff22650de00 RCX: 0000000000000000
[136509.562087] RDX: 12df5b915e214939 RSI: ffff9ff1d0411740 RDI: ffff9ff22650df40
[136509.569868] RBP: ffffa015ffa83de8 R08: ffff9ff07a2cdb80 R09: 0000000000000000
[136509.577654] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa015ffae2300
[136509.585444] R13: ffff9ff22650df40 R14: ffff9ff1d0411740 R15: 12df5b915e214939
[136509.593237] FS: 00007f61ab946f40(0000) GS:ffffa015ffa80000(0000) knlGS:0000000000000000
[136509.601985] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[136509.608145] CR2: 00007f61ab946eb8 CR3: 0000000aa281a001 CR4: 00000000007606e0
[136509.615933] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[136509.623724] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[136509.631507] PKRU: 55555554
[136509.634619] Call Trace:
[136509.637482] <IRQ>
[136509.639906] ? enqueue_entity+0x1ee/0x360
[136509.644333] enqueue_task_fair+0x91/0x730
[136509.648754] ? _raw_spin_lock_irqsave+0x25/0x50
[136509.653694] ? _raw_spin_unlock_irqrestore+0x3a/0x40
[136509.659070] activate_task+0x54/0xb0
[136509.663059] ? sched_clock_cpu+0x11/0xa0
[136509.667395] ttwu_do_activate+0x49/0x80
[136509.671641] try_to_wake_up+0x1d1/0x460
[136509.675887] ? hrtimer_init+0x1b0/0x1b0
[136509.680134] wake_up_process+0x15/0x20
[136509.684295] hrtimer_wakeup+0x22/0x30
[136509.688364] __hrtimer_run_queues+0xf7/0x250
[136509.693048] hrtimer_interrupt+0x116/0x250
[136509.697554] ? irqtime_account_irq+0x39/0xb0
[136509.702238] smp_apic_timer_interrupt+0x6f/0x150
[136509.707274] apic_timer_interrupt+0xf/0x20
[136509.711785] </IRQ>
[136509.714294] RIP: 0010:__do_page_fault+0x25d/0x4f0
[136509.719415] Code: 00 00 48 83 80 58 06 00 00 01 0f 1f 44 00 00 48 83 c4 38 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 3a 1e 12 00 fb 66 0f 1f 44 00 00 <41> bc 54 00 00 00 48 83 cb 04 e9 84 fe ff ff 48 8b 4d c8 4c 89 ea
[136509.739094] RSP: 0000:ffffb9be64a4feb0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[136509.747312] RAX: ffffb9be64a4ff10 RBX: 0000000000000007 RCX: 0000000000000000
[136509.763078] RDX: 00007f61ab946eb8 RSI: 0000000000000007 RDI: ffffb9be64a4ff58
[136509.770868] RBP: ffffb9be64a4ff10 R08: 0000000000000002 R09: ffffff9a0eaab816
[136509.778658] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000007
[136509.786440] R13: ffffb9be64a4ff58 R14: ffff9ff076c0cc80 R15: 0000000000000000
[136509.794229] ? sched_clock+0x9/0x10
[136509.798129] do_page_fault+0x37/0x140
[136509.802204] ? __context_tracking_exit+0x5a/0xb0
[136509.807234] ? page_fault+0x8/0x30
[136509.811051] page_fault+0x1e/0x30
[136509.814778] RIP: 0033:0x7f61aba11d6d
[136509.818765] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 6d 00 00 00 0f 05 48 3d 01 f0 ff ff 73 01 c3 48 8b 0d fd 60 0f 00 f7 d8 <64> 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e
[136509.838439] RSP: 002b:00007ffd6a084188 EFLAGS: 00010213
[136509.844075] RAX: 0000000000000001 RBX: 0000000000000032 RCX: ffffffffffffff78
[136509.851866] RDX: 0000000000000001 RSI: 0000000000000009 RDI: 0000000000000000
[136509.859658] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f61abb362f0
[136509.867452] R10: 00007f61ab946f40 R11: 0000000000000246 R12: 0000000000000001
[136509.875239] R13: 0000557a613ec640 R14: 0000000000000002 R15: 0000557a613eb0e0
[136509.883023] Modules linked in: rbd libceph dns_resolver nbd tcp_diag inet_diag binfmt_misc ebtable_filter ebtables ip6table_filter ip6_tables veth ipt_REJECT nf_reject_ipv4 vxlan ip6_udp_tunnel udp_tunnel xt_statistic xt_nat xt_comment xt_mark xt_conntrack ipt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_tables bpfilter overlay fuse btrfs xor zstd_compress raid6_pq zstd_decompress xxhash msdos 8021q garp mrp iTCO_wdt iTCO_vendor_support intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel vfat fat kvm ext4 mbcache jbd2 intel_cstate ttm pcspkr drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm i2c_i801 lpc_ich mei_me joydev mei sg cdc_ether usbnet
[136509.956517] mii ioatdma wmi ipmi_si ipmi_devintf ipmi_msghandler rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi ib_srpt target_core_mod ib_srp scsi_transport_srp pcc_cpufreq ib_ipoib acpi_power_meter acpi_pad rdma_ucm ib_umad rdma_cm ib_cm iw_cm vfio_pci vfio_virqfd vfio_iommu_type1 vfio irqbypass sctp br_netfilter bridge stp llc xfs libcrc32c mlx5_ib ib_uverbs ib_core sr_mod sd_mod cdrom igb crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel i2c_algo_bit pcbc mlx5_core dca aesni_intel i2c_core mlxfw ahci devlink libahci crypto_simd scsi_transport_iscsi ptp cryptd libata glue_helper pps_core uas usb_storage sunrpc dm_mirror dm_region_hash dm_log dm_mod

Based on the similarity of our crashes and the one reported
in the commit log, I cherry picked f6783319737f to top of
4.19.97 (with 5d299eabea5a as a prerequisite to make the
fix apply more cleanly). With the patched kernel the test
ran for three days without a crash in any of the machines.

So, do you think f6783319737f is the proper fix? Should it
be backported to 4.19.y? Does it require (or benefit from)
some additional patches that should be backported too?