RE: [v7 4/8] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts

From: Wu, Feng
Date: Mon May 25 2015 - 22:53:20 EST




> -----Original Message-----
> From: Thomas Gleixner [mailto:tglx@xxxxxxxxxxxxx]
> Sent: Monday, May 25, 2015 4:38 PM
> To: Wu, Feng
> Cc: joro@xxxxxxxxxx; dwmw2@xxxxxxxxxxxxx; jiang.liu@xxxxxxxxxxxxxxx;
> iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: [v7 4/8] iommu, x86: No need to migrating irq for VT-d
> Posted-Interrupts
>
> On Mon, 25 May 2015, Feng Wu wrote:
>
> > We don't need to migrate the irqs for VT-d Posted-Interrupts here.
> > When 'pst' is set in IRTE, the associated irq will be posted to
> > guests instead of interrupt remapping. The destination of the
> > interrupt is set in Posted-Interrupts Descriptor, and the migration
> > happens during vCPU scheduling.
> >
> > However, we still update the cached irte here, which can be used
> > when changing back to remapping mode.
> >
> > Signed-off-by: Feng Wu <feng.wu@xxxxxxxxx>
> > Reviewed-by: Jiang Liu <jiang.liu@xxxxxxxxxxxxxxx>
> > Acked-by: David Woodhouse <David.Woodhouse@xxxxxxxxx>
> > ---
> > drivers/iommu/intel_irq_remapping.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/intel_irq_remapping.c
> b/drivers/iommu/intel_irq_remapping.c
> > index 1955b09..646f4cf 100644
> > --- a/drivers/iommu/intel_irq_remapping.c
> > +++ b/drivers/iommu/intel_irq_remapping.c
> > @@ -994,7 +994,10 @@ intel_ir_set_affinity(struct irq_data *data, const
> struct cpumask *mask,
> > */
> > irte->vector = cfg->vector;
> > irte->dest_id = IRTE_DEST(cfg->dest_apicid);
> > - modify_irte(&ir_data->irq_2_iommu, irte);
> > +
> > + /* We don't need to modify irte if the interrupt is for posting. */
> > + if (irte->pst != 1)
> > + modify_irte(&ir_data->irq_2_iommu, irte);
>
> I don't think this is correct. ir_data->irte_entry contains the non
> posted version, which has pst == 0.
>
> You need some other way to store whether you are in posted mode or
> not.

Yes, seems this is incorrect. Thank you for pointing this out. After more
thinking about this, I think I can do it this way:
#1. Check the 'pst' field in hardware
#2. If 'pst' is 1, we don't update the IRTE in hardware.

However, the question is the check and update operations should be protected
by the same spinlock ' irq_2_ir_lock ', otherwise, race condition may happen.

Based on the above idea, I have two solutions for this, do you think which one
is better or you have other better suggestions? It is highly appreciated if you
can give comments about them!

Solution 1:
Introduction a new function test_and_modify_irte() which is called by intel_ir_set_affinity
in place of the original modify_irte().
Here is the changes:

+static int test_and_modify_irte(struct irq_2_iommu *irq_iommu,
+ struct irte *irte_modified)
+{
+ struct intel_iommu *iommu;
+ unsigned long flags;
+ struct irte *irte;
+ int rc, index;
+
+ if (!irq_iommu)
+ return -1;
+
+ raw_spin_lock_irqsave(&irq_2_ir_lock, flags);
+
+ iommu = irq_iommu->iommu;
+
+ index = irq_iommu->irte_index + irq_iommu->sub_handle;
+ irte = &iommu->ir_table->base[index];
+
+ if (irte->pst)
+ goto unlock;
+
+ set_64bit(&irte->low, irte_modified->low);
+ set_64bit(&irte->high, irte_modified->high);
+ __iommu_flush_cache(iommu, irte, sizeof(*irte));
+
+ rc = qi_flush_iec(iommu, index, 0);
+unlock:
+ raw_spin_unlock_irqrestore(&irq_2_ir_lock, flags);
+
+ return rc;
+}
+

Soluation 2:
Instead of introducing a new function, add a flag in the original modify_irte()
function to indicate that whether we need to check and return before updating
the real hardware, add pass 1 to return_on_pst in intel_ir_set_affinity()
Here is the changes:
static int modify_irte(struct irq_2_iommu *irq_iommu,
- struct irte *irte_modified)
+ struct irte *irte_modified
+ bool return_on_pst)
{
struct intel_iommu *iommu;
unsigned long flags;
@@ -140,11 +173,15 @@ static int modify_irte(struct irq_2_iommu *irq_iommu,
index = irq_iommu->irte_index + irq_iommu->sub_handle;
irte = &iommu->ir_table->base[index];

+ if (return_on_pst && irte->pst)
+ goto unlock;
+
set_64bit(&irte->low, irte_modified->low);
set_64bit(&irte->high, irte_modified->high);
__iommu_flush_cache(iommu, irte, sizeof(*irte));

rc = qi_flush_iec(iommu, index, 0);
+unlock:
raw_spin_unlock_irqrestore(&irq_2_ir_lock, flags);

return rc;

Thanks,
Feng

>
> Thanks,
>
> tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/