Re: [PART2 RFC v1 5/9] iommu/amd: Introduce amd_iommu_update_ga()

From: Suravee Suthikulpanit
Date: Thu Jun 09 2016 - 19:59:50 EST


Hi Radim,

On 4/13/16 12:06, Radim KrÄmÃÅ wrote:
2016-04-08 07:49-0500, Suravee Suthikulpanit:
From: Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx>

This patch introduces a new IOMMU interface, amd_iommu_update_ga(),
which allows KVM (SVM) to update existing posted interrupt IOMMU IRTE when
load/unload vcpu.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx>
---
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
@@ -4330,4 +4330,74 @@ int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
+int amd_iommu_update_ga(u32 vcpu_id, u32 cpu, u32 ga_tag,

It'd be nicer to generate the tag on SVM side and pass it whole -- IOMMU
doesn't have to care how hypervisors use the tag.

Actually, we are generating the tag from the SVM side currently (please see avic_get_next_tag() in patch 8). The amd_iommu_update_ga() is meant to be called from SVM side and we are passing in the tag here.


+ u64 base, bool is_run)
+{
+ unsigned long flags;
+ struct amd_iommu *iommu;
+
+ if (amd_iommu_guest_ir < AMD_IOMMU_GUEST_IR_GA)
+ return 0;
+
+ for_each_iommu(iommu) {
+ struct amd_ir_data *ir_data;
+
+ spin_lock_irqsave(&iommu->ga_hash_lock, flags);
+
+ hash_for_each_possible(iommu->ga_hash, ir_data, hnode,
+ AMD_IOMMU_GATAG(ga_tag, vcpu_id)) {

All tags can map into the same bucket. Code below doesn't check that
the ir_data belongs to the tag and will modify unrelated IRTEs.

Have you considered a per-VCPU list of IRTEs on the SVM side?

Actually, the hash key is basically vm-id and vcpu-id. So, this should get us all the ir_data for a specific vcpu in a particular VM.

+ set_irte_ga(iommu, ir_data->irq_2_irte.devid,
+ base, cpu, is_run);

set_irte_ga() is pretty expensive -- do we need to invalidate the irt
when changing cpu and is_run?

You are right -- I think we can actually keep a pointer to IRTE in the amd_ir_data, and use that to directly get to the IRTE when we need to update the GA mode related stuff. That way, we don't need to go through the whole interrupt remapping table.

2.2.5.2 Interrupt Virtualization Tables with Guest Virtual APIC Enabled,
point 9, bullet 5 says that IRTE is read from memory before considering
IsRun, GATag and Destination, which makes me think that avoiding races
can be faster in the common case.

Right.

I'm working on sending out V2 soon.

Thanks,
Suravee