Re: [PATCH BUGFIX V2] block, bfq: update wr_busy_queues if needed on a queue split

From: Jens Axboe
Date: Wed Jun 28 2017 - 09:51:28 EST


On 06/28/2017 07:44 AM, Paolo Valente wrote:
>
>> Il giorno 28 giu 2017, alle ore 14:42, Jens Axboe <axboe@xxxxxxxxx> ha scritto:
>>
>> On 06/27/2017 11:39 PM, Paolo Valente wrote:
>>>
>>>> Il giorno 27 giu 2017, alle ore 20:29, Jens Axboe <axboe@xxxxxxxxx> ha scritto:
>>>>
>>>> On 06/27/2017 12:27 PM, Paolo Valente wrote:
>>>>>
>>>>>> Il giorno 27 giu 2017, alle ore 16:41, Jens Axboe <axboe@xxxxxxxxx> ha scritto:
>>>>>>
>>>>>> On 06/27/2017 12:09 AM, Paolo Valente wrote:
>>>>>>>
>>>>>>>> Il giorno 19 giu 2017, alle ore 13:43, Paolo Valente <paolo.valente@xxxxxxxxxx> ha scritto:
>>>>>>>>
>>>>>>>> This commit fixes a bug triggered by a non-trivial sequence of
>>>>>>>> events. These events are briefly described in the next two
>>>>>>>> paragraphs. The impatiens, or those who are familiar with queue
>>>>>>>> merging and splitting, can jump directly to the last paragraph.
>>>>>>>>
>>>>>>>> On each I/O-request arrival for a shared bfq_queue, i.e., for a
>>>>>>>> bfq_queue that is the result of the merge of two or more bfq_queues,
>>>>>>>> BFQ checks whether the shared bfq_queue has become seeky (i.e., if too
>>>>>>>> many random I/O requests have arrived for the bfq_queue; if the device
>>>>>>>> is non rotational, then random requests must be also small for the
>>>>>>>> bfq_queue to be tagged as seeky). If the shared bfq_queue is actually
>>>>>>>> detected as seeky, then a split occurs: the bfq I/O context of the
>>>>>>>> process that has issued the request is redirected from the shared
>>>>>>>> bfq_queue to a new non-shared bfq_queue. As a degenerate case, if the
>>>>>>>> shared bfq_queue actually happens to be shared only by one process
>>>>>>>> (because of previous splits), then no new bfq_queue is created: the
>>>>>>>> state of the shared bfq_queue is just changed from shared to non
>>>>>>>> shared.
>>>>>>>>
>>>>>>>> Regardless of whether a brand new non-shared bfq_queue is created, or
>>>>>>>> the pre-existing shared bfq_queue is just turned into a non-shared
>>>>>>>> bfq_queue, several parameters of the non-shared bfq_queue are set
>>>>>>>> (restored) to the original values they had when the bfq_queue
>>>>>>>> associated with the bfq I/O context of the process (that has just
>>>>>>>> issued an I/O request) was merged with the shared bfq_queue. One of
>>>>>>>> these parameters is the weight-raising state.
>>>>>>>>
>>>>>>>> If, on the split of a shared bfq_queue,
>>>>>>>> 1) a pre-existing shared bfq_queue is turned into a non-shared
>>>>>>>> bfq_queue;
>>>>>>>> 2) the previously shared bfq_queue happens to be busy;
>>>>>>>> 3) the weight-raising state of the previously shared bfq_queue happens
>>>>>>>> to change;
>>>>>>>> the number of weight-raised busy queues changes. The field
>>>>>>>> wr_busy_queues must then be updated accordingly, but such an update
>>>>>>>> was missing. This commit adds the missing update.
>>>>>>>>
>>>>>>>
>>>>>>> Hi Jens,
>>>>>>> any idea of the possible fate of this fix?
>>>>>>
>>>>>> I sort of missed this one. It looks trivial enough for 4.12, or we
>>>>>> can defer until 4.13. What do you think?
>>>>>>
>>>>>
>>>>> It should actually be something trivial, and hopefully correct,
>>>>> because a further throughput improvement (for BFQ), which depends on
>>>>> this fix, is now working properly, and we didn't see any regression so
>>>>> far. In addition, since this improvement is virtually ready for
>>>>> submission, further steps may be probably easier if this fix gets in
>>>>> sooner (whatever the luck of the improvement will be).
>>>>
>>>> OK, let's queue it up for 4.13 then.
>>>>
>>>
>>> My arguments was in favor of 4.12 actually. Maybe you did mean 4.12
>>> here?
>>
>> You were talking about further improvements and new development on top
>> of this, so I assumed you meant 4.13. However, further development is
>> not the main criteria or concern for whether this fix should go into
>> 4.12 or not.
>
> Ok, thanks for your explanation and patience.
>
>> The main concern is if this fixes something that is crucial
>> to have in 4.12. It's late in the cycle, I'd rather not push anything
>> that isn't a regression fix at this point.
>>
>
> Hard to assess precisely how crucial this is. Certainly it fixes a
> regression. The practical, negative effects of this regression are
> systematic when one tries to add the throughput improvement I
> mentioned: the improvement almost never works. If BFQ is used as it
> is, then negative effects on throughput are less likely to happen.
>
> I hope that this piece of information is somehow useful for your
> decision.

If it's only really visible with the other change on top, then I think
we should defer to 4.13. It's not a kernel regression in the most
clinical sense, since BFQ wasn't available in 4.12.

--
Jens Axboe