Re: [PATCH -next V5] blk-mq: fix tag_get wait task can't be awakened

From: Jens Axboe
Date: Thu Jan 27 2022 - 13:07:02 EST


On 1/27/22 11:04 AM, Guenter Roeck wrote:
> On 1/27/22 09:28, Jens Axboe wrote:
>> On 1/26/22 6:32 PM, Guenter Roeck wrote:
>>> Hi,
>>>
>>> On Thu, Jan 13, 2022 at 10:55:36AM +0800, Laibin Qiu wrote:
>>>> In case of shared tags, there might be more than one hctx which
>>>> allocates from the same tags, and each hctx is limited to allocate at
>>>> most:
>>>> hctx_max_depth = max((bt->sb.depth + users - 1) / users, 4U);
>>>>
>>>> tag idle detection is lazy, and may be delayed for 30sec, so there
>>>> could be just one real active hctx(queue) but all others are actually
>>>> idle and still accounted as active because of the lazy idle detection.
>>>> Then if wake_batch is > hctx_max_depth, driver tag allocation may wait
>>>> forever on this real active hctx.
>>>>
>>>> Fix this by recalculating wake_batch when inc or dec active_queues.
>>>>
>>>> Fixes: 0d2602ca30e41 ("blk-mq: improve support for shared tags maps")
>>>> Suggested-by: Ming Lei <ming.lei@xxxxxxxxxx>
>>>> Suggested-by: John Garry <john.garry@xxxxxxxxxx>
>>>> Signed-off-by: Laibin Qiu <qiulaibin@xxxxxxxxxx>
>>>
>>> I understand this problem has been reported already, but still:
>>>
>>> This patch causes a hang in several of my qemu emulations when
>>> trying to boot from usb. Reverting it fixes the problem. Bisect log
>>> is attached.
>>>
>>> Boot logs are available at
>>> https://kerneltests.org/builders/qemu-arm-aspeed-master/builds/230/steps/qemubuildcommand/logs/stdio
>>> but don't really show much: the affected tests simply hang until they
>>> are aborted.
>>
>> This one got reported a few days ago, can you check if applying:
>>
>> https://git.kernel.dk/cgit/linux-block/commit/?h=block-5.17&id=10825410b956dc1ed8c5fbc8bbedaffdadde7f20
>>
>> fixes it for you?
>>
> Yes, it does.

Great, thanks for reporting/testing.

--
Jens Axboe