RE: [PATCH 1/2] iommu/vt-d: Support batching IOTLB/dev-IOTLB invalidation commands

From: Zhang, Tina
Date: Tue Jun 04 2024 - 01:59:58 EST


Hi Baolu,

> -----Original Message-----
> From: Baolu Lu <baolu.lu@xxxxxxxxxxxxxxx>
> Sent: Tuesday, June 4, 2024 9:15 AM
> To: Zhang, Tina <tina.zhang@xxxxxxxxx>; Tian, Kevin <kevin.tian@xxxxxxxxx>
> Cc: baolu.lu@xxxxxxxxxxxxxxx; iommu@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH 1/2] iommu/vt-d: Support batching IOTLB/dev-IOTLB
> invalidation commands
>
> On 6/3/24 3:37 PM, Zhang, Tina wrote:
> >> -----Original Message-----
> >> From: Baolu Lu<baolu.lu@xxxxxxxxxxxxxxx>
> >> Sent: Sunday, May 19, 2024 5:43 PM
> >> To: Zhang, Tina<tina.zhang@xxxxxxxxx>; Tian,
> >> Kevin<kevin.tian@xxxxxxxxx>
> >> Cc:baolu.lu@xxxxxxxxxxxxxxx;iommu@xxxxxxxxxxxxxxx; linux-
> >> kernel@xxxxxxxxxxxxxxx
> >> Subject: Re: [PATCH 1/2] iommu/vt-d: Support batching IOTLB/dev-IOTLB
> >> invalidation commands
> >>
> >> On 5/17/24 8:37 AM, Tina Zhang wrote:
> >>> Introduce a new parameter batch_desc to the QI based IOTLB/dev-IOTLB
> >>> invalidation operations to support batching invalidation descriptors.
> >>> This batch_desc is a pointer to the descriptor entry in a batch cmds
> >>> buffer. If the batch_desc is NULL, it indicates that batch
> >>> submission is not being used, and descriptors will be submitted individually.
> >>>
> >>> Also fix an issue reported by checkpatch about "unsigned mask":
> >>> "Prefer 'unsigned int' to bare use of 'unsigned'"
> >>>
> >>> Signed-off-by: Tina Zhang<tina.zhang@xxxxxxxxx>
> >>> ---
> >>> drivers/iommu/intel/cache.c | 33 +++++++++++-------
> >>> drivers/iommu/intel/dmar.c | 67 ++++++++++++++++++++-----------------
> >>> drivers/iommu/intel/iommu.c | 27 +++++++++------
> >>> drivers/iommu/intel/iommu.h | 21 ++++++++----
> >>> drivers/iommu/intel/pasid.c | 20 ++++++-----
> >>> 5 files changed, 100 insertions(+), 68 deletions(-)
> >>>
> >>> diff --git a/drivers/iommu/intel/cache.c
> >>> b/drivers/iommu/intel/cache.c index e8418cdd8331..dcf5e0e6af17
> >>> 100644
> >>> --- a/drivers/iommu/intel/cache.c
> >>> +++ b/drivers/iommu/intel/cache.c
> >>> @@ -278,7 +278,7 @@ void cache_tag_flush_range(struct dmar_domain
> >> *domain, unsigned long start,
> >>> case CACHE_TAG_NESTING_IOTLB:
> >>> if (domain->use_first_level) {
> >>> qi_flush_piotlb(iommu, tag->domain_id,
> >>> - tag->pasid, addr, pages, ih);
> >>> + tag->pasid, addr, pages, ih,
> >> NULL);
> >>> } else {
> >> I'd like to have all batched descriptors code inside this file to
> >> make it easier for maintenance. Perhaps we can add the below
> >> infrastructure in the dmar_domain structure together with the cache tag.
> > Does it suggest we need to add a batch version of
> qi_flush_iotlb/qi_flush_dev_iotlb/qi_flush_piotlb/qi_flush_dev_iotlb_pasid() in
> the cache.c file? It doesn't sound like an easy to maintain those functions, does
> it?
>
> Yes. I don't think it's that difficult as the helpers just compose a qi descriptor and
> insert it in a local queue. This local queue will be flushed after finishing iterating
> all cache tags, or there's no room for more descriptors, or switches to a different
> iommu. Have I missed anything?

In current VT-d driver, only qi_flush_xxx() functions have the knowledge about how to make IOTLB invalidation descriptors. In qi_flush_xxx() functions, VT-d invalidation descriptors are populated and submitted to hardware immediately.

To support batching command, I think we can have two choices:
1. Extend qi_flush_xxx() functions to support batching descriptors. (Just like the implementation in this version)
In this way, the knowledge of populating an IOTLB invalidation descriptor in qi_flush_xxx() is reused. Additional code change is for batching the descriptor command into a buffer.

2. Introduce a new set of interfaces to populate IOTLB descriptors and batch them into a batch buffer.
The new set of interfaces is implemented in the cache.c file and we need to copy the knowledge about how to populate IOTLB descriptors from qi_flush_xxx() interfaces into the new interfaces. I hesitated to choose this option because it would duplicate code. Maybe we can generalize the knowledge of populating IOTLB descriptors into lower level interfaces and make the current qi_flush_xxx() and the new set of interfaces call them.

Which option do you think is better?

Regards,
-Tina