Re: stalling IO regression since linux 5.12, through 5.18
From: Jan Kara
Date: Wed Aug 17 2022 - 12:31:14 EST
On Wed 17-08-22 11:09:26, Chris Murphy wrote:
>
>
> On Wed, Aug 17, 2022, at 7:49 AM, Jan Kara wrote:
>
> >
> > Another thing worth trying is to compile the kernel without
> > CONFIG_BFQ_GROUP_IOSCHED. That will essentially disable cgroup support in
> > BFQ so we will see whether the problem may be cgroup related or not.
>
> The problem happens with a 5.12.0 kernel built without
> CONFIG_BFQ_GROUP_IOSCHED.
Thanks for testing! Just to answer your previous question: This is
different from cgroup.disable=io because BFQ takes different code paths. So
this makes it even less likely this is some obscure BFQ bug. Why BFQ could
be different here from mq-deadline is that it artificially reduces device
queue depth (it sets shallow_depth when allocating new tags) and maybe that
triggers some bug in request tag allocation.
BTW, are you sure the first problematic kernel is 5.12? Because support for
shared tagsets was added to megaraid_sas driver in 5.11 (5.11-rc3 in
particular - commit 81e7eb5bf08f3 ("Revert "Revert "scsi: megaraid_sas:
Added support for shared host tagset for cpuhotplug"")) and that is one
candidate I'd expect to start to trigger issues. BTW that may be an
interesting thing to try: Can you boot with
"megaraid_sas.host_tagset_enable = 0" kernel option and see whether the
issue reproduces?
Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR