Lockup of (raid5 or raid6) + vdo after taking out a disk under load

From: Konstantin Kharlamov
Date: Fri Jul 12 2024 - 02:53:40 EST


(this is a dup of yesterday's bugzilla bug 219030 as some maintainers
prefer bugreporting over bugzilla some over ML, and the cases are
indistinguishable)

We're seeing this on both LTS `6.1.97` as well the newest stable
`6.9.8` (note though that VDO was only merged in 6.9, so for upstream
it's probably the only one that matters. Traces below are from 6.9.8).

Creating a raid5 (alternatively raid6) + VDO, then starting a load and
after a minute physically taking out a disk of the VG results in a
lockup stack getting spammed over and over.

Seems unreproducible by removing a disk through software means of
`/sys/block/sdX/device/delete`, i.e. it seem to only happen when either
removing the disk physically or by disabling power on the JBOD slot
with a command like `sg_ses --dev-slot-num=9 --set=3:4:1 /dev/sg26`.

Also when lockup happens, `reboot -ff` no longer does anything.

When looking at repeating stacktraces, the execution seems to loop
around `raid5_get_active_stripe`, `raid5_compute_sector`,
`raid5_release_stripe`.

# Steps to reproduce

1. Create raid5 LV + VDO by executing a `./mk_lvm_raid5.sh /dev/sdX1
/dev/sdY1 /dev/sdZ1` where `mk_lvm_raid5.sh` has the following content:

#!/bin/bash

set -exu

if [ "$#" -ne 3 ]; then
echo "Wrong number of parameters.
Usage: $(basename $0) disk1 disk2 disk3"
exit 1
fi

# create the VG
pvcreate -f "$1" "$2" "$3"
vgcreate p_r5 "$1" "$2" "$3"

# create the LV
lvcreate --type raid5 -i 2 -L 21474836480b -I 64K -n
vdo_internal_deco_vol p_r5 -y
lvconvert -y --type vdo-pool --virtualsize 107374182400B -n
deco_vol p_r5/vdo_internal_deco_vol

2. Start load by executing `fio ./fio-30%write.fio` with the `fio-
30%write.fio` having the following content:

[test IOPS]
blocksize=8k
filename=/dev/p_r5/deco_vol
filesize=100G
direct=1
buffered=0
ioengine=libaio
iodepth=32
rw=randrw
rwmixwrite=30
numjobs=4
group_reporting
time_based
runtime=99h
clat_percentiles=0
unlink=1

3. Wait for about a minute
4. Remove a disk of the volume group, either physically, or by turning
off jbod slot's power (DO NOT use /…/device/delete).
5. Wait 30 seconds

## Expected

`dmesg -w` won't show anything interesting

## Actual

`dmesg -w` starts showing up traces as follows (not sure why the
missing line numbers, I have debug symbols locally, I can decode them
manually with `gdb` if needed):

[…]
[ 869.086048] mpt3sas_cm0: log_info(0x31110e05): originator(PL),
code(0x11), sub_code(0x0e05)
[ 869.139959] mpt3sas_cm0: mpt3sas_transport_port_remove: removed:
sas_addr(0x5002538a67b01303)
[ 869.140085] mpt3sas_cm0: removing handle(0x0022),
sas_addr(0x5002538a67b01303)
[ 869.140191] mpt3sas_cm0: enclosure logical
id(0x50015b2140128f7f), slot(12)
[ 869.140293] mpt3sas_cm0: enclosure level(0x0000), connector
name( )
[ 893.262506] watchdog: BUG: soft lockup - CPU#5 stuck for 26s!
[mdX_raid5:24608]
[ 893.262641] Modules linked in: dm_raid mptctl mptbase bonding
fcoe libfcoe 8021q garp mrp libfc stp llc scsi_transport_fc
sch_fq_codel softdog dm_round_robin scsi_dh_alua intel_rapl_msr
coretemp intel_rapl_common sb_edac x86_pkg_temp_thermal
intel_powerclamp ast kvm_intel drm_shmem_helper drm_kms_helper kvm
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3
sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd rapl
intel_cstate wdat_wdt intel_pch_thermal ipmi_ssif dm_multipath
acpi_ipmi sg ipmi_si ipmi_devintf ipmi_msghandler acpi_pad
acpi_power_meter ramoops reed_solomon efi_pstore drm ip_tables x_tables
autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor xor async_tx raid6_pq libcrc32c raid0 ses enclosure mlx5_ib
ib_core raid1 ixgbe igb mlx5_core mdio_devres xfrm_algo i2c_algo_bit
e1000e mpt3sas hwmon dca mdio i2c_i801 ahci lpc_ich ptp i2c_smbus
mfd_core raid_class libahci pps_core libphy scsi_transport_sas
[ 893.264033] CPU: 5 PID: 24608 Comm: mdX_raid5 Not tainted 6.9.8-
bstrg #3
[ 893.264254] Hardware name: AIC HA401-LB2/LIBRA, BIOS LIBBV071
04/19/2017
[ 893.264479] RIP: 0010:_raw_spin_lock_irqsave+0xe/0x30
[ 893.264714] Code: 48 89 ef 5d e9 13 33 47 ff 0f 1f 00 90 90 90
90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 48 89 fd
9c 41 5c <fa> bf 01 00 00 00 e8 07 96 43 ff 48 89 ef e8 df 32 47 ff 4c
89 e0
[ 893.265229] RSP: 0018:ffffbc306091fd70 EFLAGS: 00000246
[ 893.265502] RAX: 0000000000000001 RBX: 0000000000000001 RCX:
0000000000000000
[ 893.265783] RDX: 0000000000000001 RSI: 0000000000000003 RDI:
ffff9fbacf48c430
[ 893.266069] RBP: ffff9fbacf48c430 R08: ffff9fcac1254d58 R09:
00000000ffffffff
[ 893.266366] R10: 0000000000000000 R11: ffffbc306091fe20 R12:
0000000000000246
[ 893.266664] R13: ffff9fcac1254dc0 R14: 0000000000000000 R15:
0000000000739d18
[ 893.266968] FS: 0000000000000000(0000)
GS:ffff9fc9fff40000(0000) knlGS:0000000000000000
[ 893.267282] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 893.267603] CR2: 000055f91452ff88 CR3: 0000000124c74004 CR4:
00000000003706f0
[ 893.267933] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 893.268268] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 893.268605] Call Trace:
[ 893.268945] <IRQ>
[ 893.269285] ? watchdog_timer_fn+0x24b/0x2d0
[ 893.269638] ? __pfx_watchdog_timer_fn+0x10/0x10
[ 893.269990] ? __hrtimer_run_queues+0x112/0x2b0
[ 893.270347] ? hrtimer_interrupt+0x101/0x240
[ 893.270708] ? __sysvec_apic_timer_interrupt+0x6e/0x180
[ 893.271075] ? sysvec_apic_timer_interrupt+0x9d/0xd0
[ 893.271448] </IRQ>
[ 893.271816] <TASK>
[ 893.272184] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 893.272568] ? _raw_spin_lock_irqsave+0xe/0x30
[ 893.272956] __wake_up+0x1d/0x60
[ 893.273345] md_wakeup_thread+0x38/0x70
[ 893.273738] raid5d+0x32a/0x5d0 [raid456]
[ 893.274147] md_thread+0xc1/0x170
[ 893.274545] ? __pfx_autoremove_wake_function+0x10/0x10
[ 893.274951] ? __pfx_md_thread+0x10/0x10
[ 893.275358] kthread+0xff/0x130
[ 893.275760] ? __pfx_kthread+0x10/0x10
[ 893.276172] ret_from_fork+0x30/0x50
[ 893.276568] ? __pfx_kthread+0x10/0x10
[ 893.276961] ret_from_fork_asm+0x1a/0x30
[ 893.277351] </TASK>