md raid6 oops in 6.6.4 stable

From: Genes Lists
Date: Thu Dec 07 2023 - 08:10:12 EST


I have not had chance to git bisect this but since it happened in stable I thought it was important to share sooner than later.

One possibly relevant commit between 6.6.3 and 6.6.4 could be:

commit 2c975b0b8b11f1ffb1ed538609e2c89d8abf800e
Author: Song Liu <song@xxxxxxxxxx>
Date: Fri Nov 17 15:56:30 2023 -0800

md: fix bi_status reporting in md_end_clone_io

log attached shows page_fault_oops.
Machine was up for 3 days before crash happened.

geneDec 06 19:20:54 s6 kernel: BUG: unable to handle page fault for address: ffff8881019312e8
Dec 06 19:20:54 s6 kernel: #PF: supervisor write access in kernel mode
Dec 06 19:20:54 s6 kernel: #PF: error_code(0x0003) - permissions violation
Dec 06 19:20:54 s6 kernel: PGD 336e01067 P4D 336e01067 PUD 1019ee063 PMD 1019f0063 PTE 8000000101931021
Dec 06 19:20:54 s6 kernel: Oops: 0003 [#1] PREEMPT SMP PTI
Dec 06 19:20:54 s6 kernel: CPU: 3 PID: 773 Comm: md127_raid6 Not tainted 6.6.4-stable-1 #4 784c1c710646cffc1e8cc5978f8f6cec974aa179
Dec 06 19:20:54 s6 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z370 Extreme4, BIOS P4.20 10/31/2019
Dec 06 19:20:54 s6 kernel: RIP: 0010:update_io_ticks+0x2c/0x60
Dec 06 19:20:54 s6 kernel: Code: 1f 00 0f 1f 44 00 00 48 8b 4f 28 48 39 f1 78 17 80 7f 31 00 74 3b 48 8b 47 10 48 8b 78 40 48 8b 4f 28 48 39 f1 79 e9 48 89 c8 <f0> 48 0f b1 77 28 75 de 48 89 f0 48 29 c8 84 d2 b9 01 00 >
Dec 06 19:20:54 s6 kernel: RSP: 0018:ffffc90000c0bb78 EFLAGS: 00010296
Dec 06 19:20:54 s6 kernel: RAX: cccccccccccccccc RBX: ffff8881019312c0 RCX: cccccccccccccccc
Dec 06 19:20:54 s6 kernel: RDX: 0000000000000001 RSI: 0000000110f28f4e RDI: ffff8881019312c0
Dec 06 19:20:54 s6 kernel: RBP: 0000000000000001 R08: ffff888104cc1760 R09: 0000000080200016
Dec 06 19:20:54 s6 kernel: R10: ffff88851f0ced00 R11: ffff8888beffb000 R12: 0000000000000008
Dec 06 19:20:54 s6 kernel: R13: 0000000000000028 R14: 0000000000000008 R15: 0000000000000048
Dec 06 19:20:54 s6 kernel: FS: 0000000000000000(0000) GS:ffff88889eec0000(0000) knlGS:0000000000000000
Dec 06 19:20:54 s6 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 06 19:20:54 s6 kernel: CR2: ffff8881019312e8 CR3: 0000000336020002 CR4: 00000000003706e0
Dec 06 19:20:54 s6 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec 06 19:20:54 s6 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Dec 06 19:20:54 s6 kernel: Call Trace:
Dec 06 19:20:54 s6 kernel: <TASK>
Dec 06 19:20:54 s6 kernel: ? __die+0x23/0x70
Dec 06 19:20:54 s6 kernel: ? page_fault_oops+0x171/0x4e0
Dec 06 19:20:54 s6 kernel: ? exc_page_fault+0x175/0x180
Dec 06 19:20:54 s6 kernel: ? asm_exc_page_fault+0x26/0x30
Dec 06 19:20:54 s6 kernel: ? update_io_ticks+0x2c/0x60
Dec 06 19:20:54 s6 kernel: bdev_end_io_acct+0x63/0x160
Dec 06 19:20:54 s6 kernel: md_end_clone_io+0x75/0xa0 [md_mod b6ca17ee4ae6c03e518ad33b70ddd658bdb0c03a]
Dec 06 19:20:54 s6 kernel: handle_stripe_clean_event+0x1ee/0x430 [raid456 ca9a49662bf54a9ebef65a8016b05e6c30248d77]
Dec 06 19:20:54 s6 kernel: handle_stripe+0x7b6/0x1ac0 [raid456 ca9a49662bf54a9ebef65a8016b05e6c30248d77]
Dec 06 19:20:54 s6 kernel: handle_active_stripes.isra.0+0x38d/0x550 [raid456 ca9a49662bf54a9ebef65a8016b05e6c30248d77]
Dec 06 19:20:54 s6 kernel: raid5d+0x488/0x750 [raid456 ca9a49662bf54a9ebef65a8016b05e6c30248d77]
Dec 06 19:20:54 s6 kernel: ? lock_timer_base+0x61/0x80
Dec 06 19:20:54 s6 kernel: ? prepare_to_wait_event+0x60/0x180
Dec 06 19:20:54 s6 kernel: ? __pfx_md_thread+0x10/0x10 [md_mod b6ca17ee4ae6c03e518ad33b70ddd658bdb0c03a]
Dec 06 19:20:54 s6 kernel: md_thread+0xab/0x190 [md_mod b6ca17ee4ae6c03e518ad33b70ddd658bdb0c03a]
Dec 06 19:20:54 s6 kernel: ? __pfx_autoremove_wake_function+0x10/0x10
Dec 06 19:20:54 s6 kernel: kthread+0xe5/0x120
Dec 06 19:20:54 s6 kernel: ? __pfx_kthread+0x10/0x10
Dec 06 19:20:54 s6 kernel: ret_from_fork+0x31/0x50
Dec 06 19:20:54 s6 kernel: ? __pfx_kthread+0x10/0x10
Dec 06 19:20:54 s6 kernel: ret_from_fork_asm+0x1b/0x30
Dec 06 19:20:54 s6 kernel: </TASK>
Dec 06 19:20:54 s6 kernel: Modules linked in: algif_hash af_alg mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netfs nft_ct>
Dec 06 19:20:54 s6 kernel: snd_hda_codec kvm snd_hda_core drm_buddy snd_hwdep iTCO_wdt i2c_algo_bit mei_pxp intel_pmc_bxt snd_pcm mei_hdcp ee1004 irqbypass ttm iTCO_vendor_support rapl drm_display_helper nls_iso8859_1>
Dec 06 19:20:54 s6 kernel: CR2: ffff8881019312e8
Dec 06 19:20:54 s6 kernel: ---[ end trace 0000000000000000 ]---