Re: [PATCH v2 3/5] iommu/amd: Introduce helper functions to access and update 256-bit DTE

From: Suthikulpanit, Suravee
Date: Fri Sep 06 2024 - 10:08:25 EST


On 9/6/2024 1:21 AM, Jason Gunthorpe wrote:
I don't think you should restore, this should reflect a locking error
but we still need to move forward and put some kind of correct
data.. The code can't go backwards so it should try to move forwards..
In case of error, what if we pr_warn and put the device in blocking mode
since we need to prevent malicious DMAs.
IMHO a WARN_ON is fine, and alerts to the possible machine corruption
No need to do blocking, you should have a perfectly valid target DTE
that represents the state the HW is expected to be in. Resolve the
race by making it bin that state and move forwards.

What do you mean by "making it bin that state".

On ordering, I don't know, is this OK?

If you are leaving/entering nesting mode I think you have to write the
[2] value in the right sequence, you don't want to have the viommu
enabled unless the host page table is setup properly. So [2] is
written last when enabling, and first when disabling. Flushes required
after each write to ensure the HW doesn't see a cross-128 word bit
tear.
GuestPagingMode also has to be sequenced correctly, the GCR3 table
pointer should be invalid when it is changed, meaning you have to
write it and flush before storing the GCR3 table, and the reverse to
undo it.

The ordering, including when DTE flushes are needed, is pretty
hard. This is much simpler than, say, ARM, so I think you could open
code it, but it should be a pretty sizable bit of logic to figure out
what to do.
IOMMU hardware do not do partial interpret of the DTE and SW ensure DTE
flush after updating the DTE. Therefore, ordering should not be of a concern
here as long as the driver correctly program the entry.
Even if the IOMMU HW does a perfect 256 bit atomic read you still have
to order the CPU writes correctly. It just means you don't need to
flush.

The guidelines in "2.2.2.2 Making Device Table Entry Changes" make
this clear. The indivudal CPU writes smaller than 256 bits have to be
sequenced right.

For the interrupt remapping part, no special step is needed if we can write do 64-bit write. Similary, for the address translation part, no special step is needed if we can do 128-bit write.

This section looks like it was written before translation bits were
placed in the other 128 bit word - it assumes a single 128 bit write
is always sufficient which isn't true anymore.

So you still have the issue of having to decide if you write 128 bit
[0] or [1] first.

The GuestPagingMode bit is in effect when GV=1. So, the higher 128-bit (which contains GuestPagingMode bit) should be written first, and follow by lower 128-bit (which contans GV bit).

Thanks,
Suravee