Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook

From: Jens Axboe
Date: Fri Feb 09 2018 - 12:17:45 EST


On 2/9/18 6:21 AM, Oleksandr Natalenko wrote:
> Hi.
>
> 08.02.2018 08:16, Paolo Valente wrote:
>>> Il giorno 07 feb 2018, alle ore 23:18, Jens Axboe <axboe@xxxxxxxxx> ha
>>> scritto:
>>>
>>> On 2/7/18 2:19 PM, Paolo Valente wrote:
>>>> Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq
>>>> via
>>>> RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a
>>>> device
>>>> be re-inserted into the active I/O scheduler for that device. As a
>>>> consequence, I/O schedulers may get the same request inserted again,
>>>> even several times, without a finish_request invoked on that request
>>>> before each re-insertion.
>>>>
>>>> This fact is the cause of the failure reported in [1]. For an I/O
>>>> scheduler, every re-insertion of the same re-prepared request is
>>>> equivalent to the insertion of a new request. For schedulers like
>>>> mq-deadline or kyber, this fact causes no harm. In contrast, it
>>>> confuses a stateful scheduler like BFQ, which keeps state for an I/O
>>>> request, until the finish_request hook is invoked on the request. In
>>>> particular, BFQ may get stuck, waiting forever for the number of
>>>> request dispatches, of the same request, to be balanced by an equal
>>>> number of request completions (while there will be one completion for
>>>> that request). In this state, BFQ may refuse to serve I/O requests
>>>> from other bfq_queues. The hang reported in [1] then follows.
>>>>
>>>> However, the above re-prepared requests undergo a requeue, thus the
>>>> requeue_request hook of the active elevator is invoked for these
>>>> requests, if set. This commit then addresses the above issue by
>>>> properly implementing the hook requeue_request in BFQ.
>>>
>>> Thanks, applied.
>>>
>>
>> I Jens,
>> I forgot to add
>> Tested-by: Oleksandr Natalenko <oleksandr@xxxxxxxxxxxxxx>
>> in the patch.
>>
>> Is it still possible to add it?
>>
>
> In addition to this I think it should be worth considering CC'ing Greg
> to pull this fix into 4.15 stable tree.

I can't add the tested-by anymore, but it's easy enough to target for
stable after-the-fact.


--
Jens Axboe