在 2025/8/30 17:51, Paul Menzel 写道:
Dear Nan,
Thank you for your patch.
Am 30.08.25 um 11:05 schrieb linan666@xxxxxxxxxxxxxxx:
From: Li Nan <linan122@xxxxxxxxxx>
The 'mddev->recovery' flags can change during md_do_sync(), leading to
inconsistencies. For example, starting with MD_RECOVERY_RECOVER and
ending with MD_RECOVERY_SYNC can cause incorrect offset updates.
Can you give a concrete example?
T1 T2
md_do_sync
action = ACTION_RECOVER
(write sysfs)
action_store
set MD_RECOVERY_SYNC
[ do recovery ]
update resync_offset
The corresponding code is:
```
if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery) &&
mddev->curr_resync > MD_RESYNC_ACTIVE) {
if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) { ->SYNC is set, But what we do is recovery
if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
if (mddev->curr_resync >= mddev->resync_offset) {
pr_debug("md: checkpointing %s of %s.\n",
desc, mdname(mddev));
if (test_bit(MD_RECOVERY_ERROR,
&mddev->recovery))
mddev->resync_offset =
mddev->curr_resync_completed;
else
mddev->resync_offset =
mddev->curr_resync;
}
```
To avoid this, use the 'action' determined at the beginning of the
function instead of repeatedly checking 'mddev->recovery'.
Do you have a reproducer?
I don't have a reproducer because reproducing it requires modifying the
kernel. The approximate steps are:
- Modify the kernel to add a delay before the above check.
- Trigger recovery by removing and adding disks.
- After recovery completes, write to the sysfs interface at the delay point
to set the sync flag.