Re: [PATCH 64/67] iommu/amd: KVM: SVM: Allow KVM to control need for GA log interrupts
From: Joao Martins
Date: Wed Apr 09 2025 - 07:57:03 EST
On 04/04/2025 20:39, Sean Christopherson wrote:
> Add plumbing to the AMD IOMMU driver to allow KVM to control whether or
> not an IRTE is configured to generate GA log interrupts. KVM only needs a
> notification if the target vCPU is blocking, so the vCPU can be awakened.
> If a vCPU is preempted or exits to userspace, KVM clears is_run, but will
> set the vCPU back to running when userspace does KVM_RUN and/or the vCPU
> task is scheduled back in, i.e. KVM doesn't need a notification.
>
> Unconditionally pass "true" in all KVM paths to isolate the IOMMU changes
> from the KVM changes insofar as possible.
>
> Opportunistically swap the ordering of parameters for amd_iommu_update_ga()
> so that the match amd_iommu_activate_guest_mode().
Unfortunately I think this patch and the next one might be riding on the
assumption that amd_iommu_update_ga() is always cheap :( -- see below.
I didn't spot anything else flawed in the series though, just this one. I would
suggest holding off on this and the next one, while progressing with the rest of
the series.
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 2e016b98fa1b..27b03e718980 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> -static void __amd_iommu_update_ga(struct irte_ga *entry, int cpu)
> +static void __amd_iommu_update_ga(struct irte_ga *entry, int cpu,
> + bool ga_log_intr)
> {
> if (cpu >= 0) {
> entry->lo.fields_vapic.destination =
> @@ -3783,12 +3784,14 @@ static void __amd_iommu_update_ga(struct irte_ga *entry, int cpu)
> entry->hi.fields.destination =
> APICID_TO_IRTE_DEST_HI(cpu);
> entry->lo.fields_vapic.is_run = true;
> + entry->lo.fields_vapic.ga_log_intr = false;
> } else {
> entry->lo.fields_vapic.is_run = false;
> + entry->lo.fields_vapic.ga_log_intr = ga_log_intr;
> }
> }
>
isRun, Destination and GATag are not cached. Quoting the update from a few years
back (page 93 of IOMMU spec dated Feb 2025):
| When virtual interrupts are enabled by setting MMIO Offset 0018h[GAEn] and
| IRTE[GuestMode=1], IRTE[IsRun], IRTE[Destination], and if present IRTE[GATag],
| are not cached by the IOMMU. Modifications to these fields do not require an
| invalidation of the Interrupt Remapping Table.
This is the reason we were able to get rid of the IOMMU invalidation in
amd_iommu_update_ga() ... which sped up vmexit/vmenter flow with iommu avic.
Besides the lock contention that was observed at the time, we were seeing stalls
in this path with enough vCPUs IIRC; CCing Alejandro to keep me honest.
Now this change above is incorrect as is and to make it correct: you will need
xor with the previous content of the IRTE::ga_log_intr and then if it changes
then you re-add back an invalidation command via
iommu_flush_irt_and_complete()). The latter is what I am worried will
reintroduce these above problem :(
The invalidation command (which has a completion barrier to serialize
invalidation execution) takes some time in h/w, and will make all your vcpus
content on the irq table lock (as is). Even assuming you somehow move the
invalidation outside the lock, you will content on the iommu lock (for the
command queue) or best case assuming no locks (which I am not sure it is
possible) you will need to wait for the command to complete until you can
progress forward with entering/exiting.
Unless the GALogIntr bit is somehow also not cached too which wouldn't need the
invalidation command (which would be good news!). Adding Suravee/Vasant here.
It's a nice trick how you would leverage this in SVM, but do you have
measurements that corroborate its introduction? How many unnecessary GALog
entries were you able to avoid with this trick say on a workload that would
exercise this (like netperf 1byte RR test that sleeps and wakes up a lot) ?
I should also mention that there's a different logic that is alternative to
GALog (in Genoa or more recent), which is GAPPI support whereby an entry is
generated but a more rare condition. Quoting the an excerpts below:
| This mode is enabled by setting MMIO Offset 0018h[GAPPIEn]=1. Under this
| mode, guest interrupts (IRTE[GuestMode]=1) update the vAPIC backing page IRR
| status as normal.
| In GAPPI mode, a GAPPI interrupt is generated if all of the guest IRR bits
| were previously clear prior to the last IRR update. This indicates the new
| interrupt is the first pending interrupt to the
| vCPU. The GAPPI interrupt is used to signal Hypervisor software to schedule
| one or more vCPUs to execute pending interrupts.
| Implementations may not be able to perfectly determine if all of the IRR bits
| were previously clear prior to updating the vAPIC backing page to set IRR.
| Spurious interrupts may be generated as a
| result. Software must be designed to handle this possibility
Page 99, "2.2.5.5 Guest APIC Physical Processor Interrupt",
https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/specifications/48882_IOMMU.pdf