Re: [PATCH] iommu/iova: Retry from last rb tree node if iova search fails

From: Vijayanand Jitta
Date: Mon May 11 2020 - 07:14:20 EST




On 5/9/2020 12:25 AM, Vijayanand Jitta wrote:
>
>
> On 5/7/2020 6:54 PM, Robin Murphy wrote:
>> On 2020-05-06 9:01 pm, vjitta@xxxxxxxxxxxxxx wrote:
>>> From: Vijayanand Jitta <vjitta@xxxxxxxxxxxxxx>
>>>
>>> When ever a new iova alloc request comes iova is always searched
>>> from the cached node and the nodes which are previous to cached
>>> node. So, even if there is free iova space available in the nodes
>>> which are next to the cached node iova allocation can still fail
>>> because of this approach.
>>>
>>> Consider the following sequence of iova alloc and frees on
>>> 1GB of iova space
>>>
>>> 1) alloc - 500MB
>>> 2) alloc - 12MB
>>> 3) alloc - 499MB
>>> 4) free -Â 12MB which was allocated in step 2
>>> 5) alloc - 13MB
>>>
>>> After the above sequence we will have 12MB of free iova space and
>>> cached node will be pointing to the iova pfn of last alloc of 13MB
>>> which will be the lowest iova pfn of that iova space. Now if we get an
>>> alloc request of 2MB we just search from cached node and then look
>>> for lower iova pfn's for free iova and as they aren't any, iova alloc
>>> fails though there is 12MB of free iova space.
>>
>> Yup, this could definitely do with improving. Unfortunately I think this
>> particular implementation is slightly flawed...
>>
>>> To avoid such iova search failures do a retry from the last rb tree node
>>> when iova search fails, this will search the entire tree and get an iova
>>> if its available
>>>
>>> Signed-off-by: Vijayanand Jitta <vjitta@xxxxxxxxxxxxxx>
>>> ---
>>> Â drivers/iommu/iova.c | 11 +++++++++++
>>> Â 1 file changed, 11 insertions(+)
>>>
>>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>>> index 0e6a953..2985222 100644
>>> --- a/drivers/iommu/iova.c
>>> +++ b/drivers/iommu/iova.c
>>> @@ -186,6 +186,7 @@ static int __alloc_and_insert_iova_range(struct
>>> iova_domain *iovad,
>>> ÂÂÂÂÂ unsigned long flags;
>>> ÂÂÂÂÂ unsigned long new_pfn;
>>> ÂÂÂÂÂ unsigned long align_mask = ~0UL;
>>> +ÂÂÂ bool retry = false;
>>> Â ÂÂÂÂÂ if (size_aligned)
>>> ÂÂÂÂÂÂÂÂÂ align_mask <<= fls_long(size - 1);
>>> @@ -198,6 +199,8 @@ static int __alloc_and_insert_iova_range(struct
>>> iova_domain *iovad,
>>> Â ÂÂÂÂÂ curr = __get_cached_rbnode(iovad, limit_pfn);
>>> ÂÂÂÂÂ curr_iova = rb_entry(curr, struct iova, node);
>>> +
>>> +retry_search:
>>> ÂÂÂÂÂ do {
>>> ÂÂÂÂÂÂÂÂÂ limit_pfn = min(limit_pfn, curr_iova->pfn_lo);
>>> ÂÂÂÂÂÂÂÂÂ new_pfn = (limit_pfn - size) & align_mask;
>>> @@ -207,6 +210,14 @@ static int __alloc_and_insert_iova_range(struct
>>> iova_domain *iovad,
>>> ÂÂÂÂÂ } while (curr && new_pfn <= curr_iova->pfn_hi);
>>> Â ÂÂÂÂÂ if (limit_pfn < size || new_pfn < iovad->start_pfn) {
>>> +ÂÂÂÂÂÂÂ if (!retry) {
>>> +ÂÂÂÂÂÂÂÂÂÂÂ curr = rb_last(&iovad->rbroot);
>>
>> Why walk when there's an anchor node there already? However...
>>
>>> +ÂÂÂÂÂÂÂÂÂÂÂ curr_iova = rb_entry(curr, struct iova, node);
>>> +ÂÂÂÂÂÂÂÂÂÂÂ limit_pfn = curr_iova->pfn_lo;
>>
>> ...this doesn't look right, as by now we've lost the original limit_pfn
>> supplied by the caller, so are highly likely to allocate beyond the
>> range our caller asked for. In fact AFAICS we'd start allocating from
>> directly directly below the anchor node, beyond the end of the entire
>> address space.
>>
>> The logic I was imagining we want here was something like the rapidly
>> hacked up (and untested) diff below.
>>
>> Thanks,
>> Robin.
>>
>
> Thanks for your comments ,I have gone through below logic and I see some
> issue with retry check as there could be case where alloc_lo is set to
> some pfn other than start_pfn in that case we don't retry and there can
> still be iova available. I understand its a hacked up version, I can
> work on this.
>
> But how about we just store limit_pfn and get the node using that and
> retry for once from that node, it would be similar to my patch just
> correcting the curr node and limit_pfn update in retry check. do you see
> any issue with this approach ?
>
>
> Thanks,
> Vijay.

I found one issue with my earlier approach, where we search twice from
cached node to the start_pfn, this can be avoided if we store the pfn_hi
of the cached node make this as alloc_lo when we retry. I see the below
diff also does the same, I have posted v2 version of the patch after
going through the comments and the below diff. can you please review that.

Thanks,
Vijay
>> ----->8-----
>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>> index 0e6a9536eca6..3574c19272d6 100644
>> --- a/drivers/iommu/iova.c
>> +++ b/drivers/iommu/iova.c
>> @@ -186,6 +186,7 @@ static int __alloc_and_insert_iova_range(struct
>> iova_domain *iovad,
>> ÂÂÂÂÂÂÂ unsigned long flags;
>> ÂÂÂÂÂÂÂ unsigned long new_pfn;
>> ÂÂÂÂÂÂÂ unsigned long align_mask = ~0UL;
>> +ÂÂÂÂÂÂ unsigned long alloc_hi, alloc_lo;
>>
>> ÂÂÂÂÂÂÂ if (size_aligned)
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ align_mask <<= fls_long(size - 1);
>> @@ -196,17 +197,27 @@ static int __alloc_and_insert_iova_range(struct
>> iova_domain *iovad,
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ size >= iovad->max32_alloc_size)
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ goto iova32_full;
>>
>> +ÂÂÂÂÂÂ alloc_hi = IOVA_ANCHOR;
>> +ÂÂÂÂÂÂ alloc_lo = iovad->start_pfn;
>> +retry:
>> ÂÂÂÂÂÂÂ curr = __get_cached_rbnode(iovad, limit_pfn);
>> ÂÂÂÂÂÂÂ curr_iova = rb_entry(curr, struct iova, node);
>> +ÂÂÂÂÂÂ if (alloc_hi < curr_iova->pfn_hi) {
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ alloc_lo = curr_iova->pfn_hi;
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ alloc_hi = limit_pfn;
>> +ÂÂÂÂÂÂ }
>> +
>> ÂÂÂÂÂÂÂ do {
>> -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ limit_pfn = min(limit_pfn, curr_iova->pfn_lo);
>> -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ new_pfn = (limit_pfn - size) & align_mask;
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ alloc_hi = min(alloc_hi, curr_iova->pfn_lo);
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ new_pfn = (alloc_hi - size) & align_mask;
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ prev = curr;
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ curr = rb_prev(curr);
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ curr_iova = rb_entry(curr, struct iova, node);
>> ÂÂÂÂÂÂÂ } while (curr && new_pfn <= curr_iova->pfn_hi);
>>
>> -ÂÂÂÂÂÂ if (limit_pfn < size || new_pfn < iovad->start_pfn) {
>> +ÂÂÂÂÂÂ if (limit_pfn < size || new_pfn < alloc_lo) {
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (alloc_lo == iovad->start_pfn)
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ goto retry;
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ iovad->max32_alloc_size = size;
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ goto iova32_full;
>> ÂÂÂÂÂÂÂ }