Re: [PATCH v4 5/6] iommu/iova: Extend rbtree node caching

From: Robin Murphy
Date: Wed Sep 20 2017 - 13:28:09 EST


On 20/09/17 13:59, Tomasz Nowicki wrote:
> Hi Robin,
>
> On 19.09.2017 18:31, Robin Murphy wrote:
>> The cached node mechanism provides a significant performance benefit for
>> allocations using a 32-bit DMA mask, but in the case of non-PCI devices
>> or where the 32-bit space is full, the loss of this benefit can be
>> significant - on large systems there can be many thousands of entries in
>> the tree, such that walking all the way down to find free space every
>> time becomes increasingly awful.
>>
>> Maintain a similar cached node for the whole IOVA space as a superset of
>> the 32-bit space so that performance can remain much more consistent.
>>
>> Inspired by work by Zhen Lei <thunder.leizhen@xxxxxxxxxx>.
>>
>> Tested-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
>> Tested-by: Zhen Lei <thunder.leizhen@xxxxxxxxxx>
>> Tested-by: Nate Watterson <nwatters@xxxxxxxxxxxxxx>
>> Signed-off-by: Robin Murphy <robin.murphy@xxxxxxx>
>> ---
>>
>> v4:
>> Â - Adjust to simplified __get_cached_rbnode() behaviour
>> Â - Cosmetic tweaks
>>
>> Â drivers/iommu/iova.c | 43 +++++++++++++++++++++----------------------
>> Â include/linux/iova.h |Â 3 ++-
>> Â 2 files changed, 23 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>> index c93a6c46bcb1..a125a5786dbf 100644
>> --- a/drivers/iommu/iova.c
>> +++ b/drivers/iommu/iova.c
>> @@ -51,6 +51,7 @@ init_iova_domain(struct iova_domain *iovad, unsigned
>> long granule,
>> Â ÂÂÂÂÂ spin_lock_init(&iovad->iova_rbtree_lock);
>> ÂÂÂÂÂ iovad->rbroot = RB_ROOT;
>> +ÂÂÂ iovad->cached_node = NULL;
>> ÂÂÂÂÂ iovad->cached32_node = NULL;
>> ÂÂÂÂÂ iovad->granule = granule;
>> ÂÂÂÂÂ iovad->start_pfn = start_pfn;
>> @@ -119,39 +120,38 @@ __get_cached_rbnode(struct iova_domain *iovad,
>> unsigned long limit_pfn)
>> ÂÂÂÂÂ if (limit_pfn <= iovad->dma_32bit_pfn && iovad->cached32_node)
>> ÂÂÂÂÂÂÂÂÂ return iovad->cached32_node;
>> Â +ÂÂÂ if (iovad->cached_node)
>> +ÂÂÂÂÂÂÂ return iovad->cached_node;
>> +
>> ÂÂÂÂÂ return &iovad->anchor.node;
>> Â }
>> Â Â static void
>> -__cached_rbnode_insert_update(struct iova_domain *iovad,
>> -ÂÂÂ unsigned long limit_pfn, struct iova *new)
>> +__cached_rbnode_insert_update(struct iova_domain *iovad, struct iova
>> *new)
>> Â {
>> -ÂÂÂ if (limit_pfn != iovad->dma_32bit_pfn)
>> -ÂÂÂÂÂÂÂ return;
>> -ÂÂÂ iovad->cached32_node = &new->node;
>> +ÂÂÂ if (new->pfn_hi < iovad->dma_32bit_pfn)
>> +ÂÂÂÂÂÂÂ iovad->cached32_node = &new->node;
>> +ÂÂÂ else
>> +ÂÂÂÂÂÂÂ iovad->cached_node = &new->node;
>> Â }
>> Â Â static void
>> Â __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova
>> *free)
>> Â {
>> ÂÂÂÂÂ struct iova *cached_iova;
>> -ÂÂÂ struct rb_node *curr;
>> +ÂÂÂ struct rb_node **curr;
>> Â -ÂÂÂ if (!iovad->cached32_node)
>> +ÂÂÂ if (free->pfn_hi < iovad->dma_32bit_pfn)
>> +ÂÂÂÂÂÂÂ curr = &iovad->cached32_node;
>> +ÂÂÂ else
>> +ÂÂÂÂÂÂÂ curr = &iovad->cached_node;
>> +
>> +ÂÂÂ if (!*curr)
>> ÂÂÂÂÂÂÂÂÂ return;
>> -ÂÂÂ curr = iovad->cached32_node;
>> -ÂÂÂ cached_iova = rb_entry(curr, struct iova, node);
>> Â -ÂÂÂ if (free->pfn_lo >= cached_iova->pfn_lo) {
>> -ÂÂÂÂÂÂÂ struct rb_node *node = rb_next(&free->node);
>> -ÂÂÂÂÂÂÂ struct iova *iova = rb_entry(node, struct iova, node);
>> -
>> -ÂÂÂÂÂÂÂ /* only cache if it's below 32bit pfn */
>> -ÂÂÂÂÂÂÂ if (node && iova->pfn_lo < iovad->dma_32bit_pfn)
>> -ÂÂÂÂÂÂÂÂÂÂÂ iovad->cached32_node = node;
>> -ÂÂÂÂÂÂÂ else
>> -ÂÂÂÂÂÂÂÂÂÂÂ iovad->cached32_node = NULL;
>> -ÂÂÂ }
>> +ÂÂÂ cached_iova = rb_entry(*curr, struct iova, node);
>> +ÂÂÂ if (free->pfn_lo >= cached_iova->pfn_lo)
>> +ÂÂÂÂÂÂÂ *curr = rb_next(&free->node);
>
> IMO, we may incorrectly update iovad->cached32_node here.
>
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 32-bit boundary
> ÂÂÂÂ --------ÂÂÂÂÂÂ --------ÂÂÂÂÂÂÂ |ÂÂÂÂÂÂÂÂÂ --------ÂÂÂÂÂÂ --------
> ÂÂÂ |ÂÂÂÂÂÂÂ |ÂÂÂÂ |ÂÂÂÂÂÂÂ |ÂÂÂÂÂÂ |ÂÂÂÂÂÂÂÂ |ÂÂÂÂÂÂÂ |ÂÂÂÂ |ÂÂÂÂÂÂÂ |
> ----Â IOVA0ÂÂ -----Â IOVA1ÂÂ -----------------Â IOVA3ÂÂ -----Â anchor |
> ÂÂÂ |ÂÂÂÂÂÂÂ |ÂÂÂÂ |ÂÂÂÂÂÂÂ |ÂÂÂÂÂÂ |ÂÂÂÂÂÂÂÂ |ÂÂÂÂÂÂÂ |ÂÂÂÂ |ÂÂÂÂÂÂÂ |
> ÂÂÂÂ --------ÂÂÂÂÂÂ --------ÂÂÂÂÂÂÂ |ÂÂÂÂÂÂÂÂÂ --------ÂÂÂÂÂÂ --------
>
> If we free last IOVA from 32-bit space (IOVA1) we will update
> iovad->cached32_node to IOVA3.

That in itself is intentional and correct - the cached node is where we
*start* searching for free space, so if the topmost 32-bit pfn is free,
we want to start from whichever node represents the next-highest pfn.
Even if more 64-bit IOVAs were allocated below IOVA3 before the next
32-bit allocation, walking through those is better that walking all the
way down from the anchor.

However, you have made me realise that if IOVA3 were freed before the
next 32-bit allocation things we'd only update cached_node and leave
cached32_node with a stale pointer...

I'll get a v5 out tomorrow (and keep the new stuff at the end of the
series this time)

Thanks,
Robin.