Re: [PATCH md-6.10 5/9] md: replace sysfs api sync_action with new helpers

From: Yu Kuai
Date: Tue May 21 2024 - 22:46:23 EST


Hi,

在 2024/05/21 11:21, Xiao Ni 写道:
Hi Kuai

I've tested 07reshape5intr with the latest upstream kernel 15 times
without failure. So it's better to have a try with 07reshape5intr with
your patch set.

I just discussed with Xiao on slack, for conclusion here:

The test 07reshape5intr will add a new disk to array, then start
reshape:

mdadm /dev/md0 --add /dev/xxx
mdadm --grow /dev/md0 -n 3

However, the grow will fail:
mdadm: Failed to initiate reshape!

Root cause is that in kernel, action_store() will return -EBUSY
if MD_RECOVERY_RUNNING is set:

// mdadm add
add_bound_rdev
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);

// daemon thread
md_check_recovery
set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
// do nothing
// mdadm grow
action_store
if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
return -EBUSY
clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery)

This is a long term problem, and we need new synchronization in kernel
to make sure the grow won't fail.

Thanks,
Kuai


Regards
Xiao




On Tue, May 21, 2024 at 11:02 AM Oliver Sang <oliver.sang@xxxxxxxxx> wrote:

hi, Yu Kuai,

On Tue, May 21, 2024 at 10:20:54AM +0800, Yu Kuai wrote:
Hi,

在 2024/05/20 23:01, kernel test robot 写道:


Hello,

kernel test robot noticed "mdadm-selftests.07reshape5intr.fail" on:

commit: 18effaab5f57ef44763e537c782f905e06f6c4f5 ("[PATCH md-6.10 5/9] md: replace sysfs api sync_action with new helpers")
url: https://github.com/intel-lab-lkp/linux/commits/Yu-Kuai/md-rearrange-recovery_flage/20240509-093248
base: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git for-next
patch link: https://lore.kernel.org/all/20240509011900.2694291-6-yukuai1@xxxxxxxxxxxxxxx/
patch subject: [PATCH md-6.10 5/9] md: replace sysfs api sync_action with new helpers

in testcase: mdadm-selftests
version: mdadm-selftests-x86_64-5f41845-1_20240412
with following parameters:

disk: 1HDD
test_prefix: 07reshape5intr



compiler: gcc-13
test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4790T CPU @ 2.70GHz (Haswell) with 16G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202405202204.4e3dc662-oliver.sang@xxxxxxxxx

2024-05-14 21:36:26 mkdir -p /var/tmp
2024-05-14 21:36:26 mke2fs -t ext3 -b 4096 -J size=4 -q /dev/sda1
2024-05-14 21:36:57 mount -t ext3 /dev/sda1 /var/tmp
sed -e 's/{DEFAULT_METADATA}/1.2/g' \
-e 's,{MAP_PATH},/run/mdadm/map,g' mdadm.8.in > mdadm.8
/usr/bin/install -D -m 644 mdadm.8 /usr/share/man/man8/mdadm.8
/usr/bin/install -D -m 644 mdmon.8 /usr/share/man/man8/mdmon.8
/usr/bin/install -D -m 644 md.4 /usr/share/man/man4/md.4
/usr/bin/install -D -m 644 mdadm.conf.5 /usr/share/man/man5/mdadm.conf.5
/usr/bin/install -D -m 644 udev-md-raid-creating.rules /lib/udev/rules.d/01-md-raid-creating.rules
/usr/bin/install -D -m 644 udev-md-raid-arrays.rules /lib/udev/rules.d/63-md-raid-arrays.rules
/usr/bin/install -D -m 644 udev-md-raid-assembly.rules /lib/udev/rules.d/64-md-raid-assembly.rules
/usr/bin/install -D -m 644 udev-md-clustered-confirm-device.rules /lib/udev/rules.d/69-md-clustered-confirm-device.rules
/usr/bin/install -D -m 755 mdadm /sbin/mdadm
/usr/bin/install -D -m 755 mdmon /sbin/mdmon
Testing on linux-6.9.0-rc2-00012-g18effaab5f57 kernel
/lkp/benchmarks/mdadm-selftests/tests/07reshape5intr... FAILED - see /var/tmp/07reshape5intr.log and /var/tmp/fail07reshape5intr.log for detail
[root@fedora mdadm]# ./test --dev=loop --tests=07reshape5intr
test: skipping tests for multipath, which is removed in upstream 6.8+
kernels
test: skipping tests for linear, which is removed in upstream 6.8+ kernels
Testing on linux-6.9.0-rc2-00023-gf092583596a2 kernel
/root/mdadm/tests/07reshape5intr... FAILED - see /var/tmp/07reshape5intr.log
and /var/tmp/fail07reshape5intr.log for details
(KNOWN BROKEN TEST: always fails)

So, since this test is marked BROKEN.

Please share the whole log, and is it possible to share the two logs?


we only captured one log as attached log-18effaab5f.
also attached parent log FYI.



Thanks,
Kuai




The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240520/202405202204.4e3dc662-oliver.sang@xxxxxxxxx






.