Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle
From: Jens Axboe
Date: Fri Jan 19 2018 - 12:09:26 EST
On 1/19/18 10:05 AM, Ming Lei wrote:
> On Fri, Jan 19, 2018 at 09:52:32AM -0700, Jens Axboe wrote:
>> On 1/19/18 9:47 AM, Mike Snitzer wrote:
>>> On Fri, Jan 19 2018 at 11:41am -0500,
>>> Jens Axboe <axboe@xxxxxxxxx> wrote:
>>>
>>>> On 1/19/18 9:37 AM, Ming Lei wrote:
>>>>> On Fri, Jan 19, 2018 at 09:27:46AM -0700, Jens Axboe wrote:
>>>>>> On 1/19/18 9:26 AM, Ming Lei wrote:
>>>>>>> On Fri, Jan 19, 2018 at 09:19:24AM -0700, Jens Axboe wrote:
>>>>>>
>>>>>> There are no pending requests for this case, nothing to restart the
>>>>>> queue. When you fail that blk_get_request(), you are idle, nothing
>>>>>> is pending.
>>>>>
>>>>> I think we needn't worry about that, once a device is attached to
>>>>> dm-rq, it can't be mounted any more, and usually user don't use the device
>>>>> directly and by dm-mpath at the same time.
>>>>
>>>> Even if it doesn't happen for a normal dm setup, it is a case that
>>>> needs to be handled. The request allocation is just one example of
>>>> a wider scope resource that can be unavailable. If the driver returns
>>>> NO_DEV_RESOURCE (or whatever name), it will be a possibility that
>>>> the device itself is currently idle.
>>>
>>> How would a driver's resources be exhausted yet the device is idle (so
>>> as not to be able to benefit from RESTART)?
>>
>> I've outlined a number of these examples already. Another case might be:
>>
>> 1) Device is idle
>> 2) Device gets request
>> 3) Device attempts to DMA map
>> 4) DMA map fails because the IOMMU is out of space (nic is using it all)
>> 5) Device returns STS_RESOURCE
>> 6) Queue is marked as needing a restart
>>
>> All's well, except there is no IO on this device that will notice the
>> restart bit and retry the operation.
>>
>> Replace IOMMU failure with any other resource that the driver might need
>> for an IO, which isn't tied to a device specific resource.
>> blk_get_request() on dm is an example, as is any allocation failure
>> occurring in the queue IO path for the driver.
>
> Yeah, if we decide to take the approach of introducing NO_DEV_RESOURCE, all
> the current STS_RESOURCE for non-device resource allocation(kmalloc, dma
> map, get_request, ...) should be converted to NO_DEV_RESOURCE.
>
> And it is a generic issue, which need generic solution.
Precisely.
> Seems running queue after arbitrary in this case is the only way we
> thought of, or other solutions?
I think that is the only solution. If it's a frequent enough occurence
to cause performance issues, then it's likely down to a specific
resource shortage, and we can tackle that independently (we need to,
since each of those will need a specialized solution).
> If the decision is made, let's make/review the patch, :-)
Let 'er rip.
--
Jens Axboe