Could you explain how the same work can be re-queued? Isn't the PENDING_BIT
is already set in t3? I believe queue_work shouldn't do that per the comment
but I am not expert ...
This is not related to workqueue, it is just because raid10
reinitialize the work that is already queued,
I am trying to understand the possibility.
like I discribed later in t3:
t2:
md_check_recovery:
INIT_WORK -> clear pending
queue_work -> set pending
list_add_tail
...
t3: -> work is still pending
md_check_recovery:
INIT_WORK -> clear pending
queue_work -> set pending
list_add_tail -> list is corrupted
First, t2 and t3 can't be run in parallel since reconfig_mutex must be held. And if sync_thread existed,
the second process would unregister and reap sync_thread which means the second process will
call INIT_WORK and queue_work again.
Maybe your description is valid, I would prefer call work_pending and flush_workqueue instead of
INIT_WORK and queue_work.
Of course, our 5.10 and mainline are the same,
there are some tests:
First the deadlock can be reporduced reliably, test script is simple:
mdadm -Cv /dev/md0 -n 4 -l10 /dev/sd[abcd]
So this is raid10 while the previous problem was appeared in raid456, I am not sure it is the same
issue, but let's see.
Then, the problem MD_RECOVERY_RUNNING can be cleared can't be reporduced
reliably, usually it takes 2+ days to triggered a problem, and each time
problem phenomenon can be different, I'm hacking the kernel and add
some BUG_ON to test MD_RECOVERY_RUNNING in attached patch, following
test can trigger the BUG_ON:
Also your debug patch obviously added large delay which make the calltrace happen, I doubt
if user can hit it in real life. Anyway, will try below test from my side.
mdadm -Cv /dev/md0 -e1.0 -n 4 -l 10 /dev/sd{a..d} --run
sleep 5
echo 1 > /sys/module/md_mod/parameters/set_delay
echo idle > /sys/block/md0/md/sync_action &
sleep 5
echo "want_replacement" > /sys/block/md0/md/dev-sdd/state
test result:
[ 228.390237] md_check_recovery: running is set
[ 228.391376] md_check_recovery: queue new sync thread
[ 233.671041] action_store unregister success! delay 10s
[ 233.689276] md_check_recovery: running is set
[ 238.722448] md_check_recovery: running is set
[ 238.723328] md_check_recovery: queue new sync thread
[ 238.724851] md_do_sync: before new wor, sleep 10s
[ 239.725818] md_do_sync: delay done
[ 243.674828] action_store delay done
[ 243.700102] md_reap_sync_thread: running is cleared!
[ 243.748703] ------------[ cut here ]------------
[ 243.749656] kernel BUG at drivers/md/md.c:9084!
After your debug patch applied, is L9084 points to below?
9084 mddev->curr_resync = MaxSector;
I don't understand how it triggers below calltrace, and it has nothing to do with
list corruption, right?