Re: [PATCH] iommu/amd: Do not flush IRTE when only updating isRun and destination fields

From: Maxim Levitsky
Date: Thu Oct 19 2023 - 09:33:25 EST


On Thu, 2023-10-19 at 16:11 +0700, Suthikulpanit, Suravee wrote:
> Maxim,
>
> On 10/17/2023 10:36 PM, Suthikulpanit, Suravee wrote:
> > On 10/17/2023 9:51 PM, Maxim Levitsky wrote:
> > > У вт, 2023-10-17 у 09:42 -0500, Suravee Suthikulpanit пише:
> > > > According to the recent update in the AMD IOMMU spec [1], the IsRun and
> > > > Destination fields of the Interrupt Remapping Table Entry (IRTE) are not
> > > > cached by the IOMMU hardware.
> > > Is that true for all AMD hardware that supports AVIC? E.g Zen1/Zen2
> > > hardware?
> >
> > This is true for all AVIC/x2AVIC-capable IOMMU hardware in the past.
> >
> > > Is there a chance that this will cause a similar errata to the is_running
> > > errata that Zen2 cpus have?
> >
> > Please let me check on this and get back.
>
> Just to be sure, could you please tell me which errata number are you
> referring to here?

Hi!

The errata is this one:

https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/revision-guides/56323-PUB_1_01.pdf
(ERRATA #1235)

The errata meaning is that once the CPU sets is_running=0 in AVIC's physical ID table,
the hardware might still not cause #incomplete_ipi vmexit, for a few times.

This means that KVM is not notified of the pending interrupt and it lets the target vCPU block instead of being woken up.


In regard to the IOMMU, the parallel errata like situation would be that CPU sets is_running=0 in the IOMMU tables,
exactly afterwards an interrupt arrives and the IOMMU doesn't log the interrupt in the GA log.

Note that regardless if the IOMMU works correctly or not, the KVM has to check IRR after it sets is_running=0 to avoid memory
reordering races, but it does so and uses a memory barrier to ensure correctness.


PS: Recently I realized that a very simple workaround exists for errata #1235: At expense of the performance, the KVM can
permanently set is_running=0 in the AVIC's physical id table.

This will ensure that all ICR writes except self IPIs will cause a VM exit, thus in the expense of performance will
ensure that AVIC works correctly.

And this configuration while slower than full AVIC, it's still much better than having no AVIC,
because AVIC still accelerates the receiver side of IPIs, and also IOMMU can still post interrupts.
In fact with the workaround applied, AVIC is exactly equivalent to Intel's APICv without IPI virtualization.


This is the latest version of my patch series:

[PATCH v3 0/4] Allow AVIC's IPI virtualization to be optional
https://www.spinics.net/lists/kvm/msg328416.html

A help with a review will be highly appreciated.

In addition to that I would highly appreciate if AMD could respond to these questions:

1. Is the errata #1235 really not present on Zen1 CPUs? It doesn't appear in the revision guide,
but I still suspect that the guide might just not be up to date. I haven't tested a Zen1 machine
to see if I can reproduce it yet.

2. As I see AMD disabled AVIC on Zen3 machines, but it does appear to work just fine in my testing.
Is it possible that the reason for the disabling is also related to ICR/IPI emulation and that if I apply
the same workaround as for errata #1235, then AVIC could be safely enabled on Zen3,
despite support for it not being present in CPUID?

If I know the answer to these questions, it might be possible to enable AVIC on all current AMD hardware by default.

AFAIK (my personal opinion, based on various downstream bugs opened and their urgency) that our customers do care about APICv,
but somehow assume as a fact that AMD doesn't have such feature while in fact AMD does.

So I think that if AVIC were to be enabled by default on AMD, that would be something that AMD's (and ours)
customers would appreciate very much.

Best regards,
Maxim Levitsky

>
> Thanks,
> Suravee
>