Re: 6.0-rc1 regression block (blk_mq) / RCU task stuck errors + block-io hang
From: Hans de Goede
Date: Sat Aug 20 2022 - 11:37:50 EST
Hi Bart,
On 8/19/22 16:49, Bart Van Assche wrote:
> On 8/19/22 05:01, Hans de Goede wrote:
>> I've been dogfooding 6.0-rc1 on my main workstation and I have hit
>> this pretty serious bug, serious enough for me to go back to 5.19
>>
>> My dmesg is showing various blk_mq (RCU?) related lockdep splats
>> followed by some tasks getting stuck on disk-IO. E.g. "sync"
>> is guaranteed to hang, but other tasks too.
>>
>> This seems to be mainly the case on "sd" disks (both sata
>> and USB) where as my main nvme drive seems fine, which has
>> probably saved me from worse issues...
>>
>> Here are 4 task stuck reports from my last boot, where
>> I had to turn off the machine by keeping the power button
>> pressed for 4 seconds.
>>
>> [ ... ]
>>
>> Sorry for not being able to write a better bug-report but I don't have
>> the time to dive into this deeper. I hope this report is enough for
>> someone to have a clue what is going on.
>
> Thank you for the detailed report. I think this report is detailed enough to root-cause this issue, something that was not possible before this report.
>
> Please help with verifying whether this patch fixes this issue: "[PATCH] scsi: sd: Revert "Rework asynchronous resume support"" (https://lore.kernel.org/linux-scsi/20220816172638.538734-1-bvanassche@xxxxxxx/).
Thanks that is very useful. I'm running 6.0-rc1 with this
patch added now and so far I've not seen the problem re-occur.
I was also seeing 6.0 suspend/resume issues on 2 laptops with
sata disks (rather then NVME) which I did not yet get around
to collecting logs from / reporting. I'm happy to report that
those suspend/resume issues are also fixed by this.
I'll reply to the patch with my Tested-by for this.
Regards,
Hans