Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition

From: Dragan Stancevic
Date: Tue Sep 05 2023 - 13:10:04 EST


On 9/4/23 22:50, Yu Kuai wrote:
Hi,

在 2023/08/30 9:36, Yu Kuai 写道:
Hi,

在 2023/08/29 4:32, Dragan Stancevic 写道:

Just a followup on 6.1 testing. I tried reproducing this problem for 5 days with 6.1.42 kernel without your patches and I was not able to reproduce it.

oops, I forgot that you need to backport this patch first to reporduce
this problem:

https://lore.kernel.org/all/20230529132037.2124527-2-yukuai1@xxxxxxxxxxxxxxx/

The patch fix the deadlock as well, but it introduce some regressions.

Ha, jinx :) I was about to email you that I isolated that change with the testing over the weekend that made it more difficult to reproduce in 6.1 and that the original change must be reverted :)




Thanks,
Kuai


It seems that 6.1 has some other code that prevents this from happening.


I see that there are lots of patches for raid456 between 5.10 and 6.1,
however, I remember that I used to reporduce the deadlock after 6.1, and
it's true it's not easy to reporduce, see below:

https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@xxxxxxxxxxxxxxx/

My guess is that 6.1 is harder to reporduce than 5.10 due to some
changes inside raid456.

By the way, raid10 had a similiar deadlock, and can be fixed the same
way, so it make sense to backport these patches.

https://lore.kernel.org/r/20230529132037.2124527-5-yukuai1@xxxxxxxxxxxxxxx

Thanks,
Kuai


On 5.10 I can reproduce it within minutes to an hour.


.



--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla