RE: [PATCH] iommu/arm-smmu-v3: remove the approach of MSI polling for CMD SYNC

From: Song Bao Hua (Barry Song)
Date: Mon Jul 20 2020 - 20:44:58 EST




> -----Original Message-----
> From: Song Bao Hua (Barry Song)
> Sent: Friday, July 17, 2020 9:06 PM
> To: 'Robin Murphy' <robin.murphy@xxxxxxx>; will@xxxxxxxxxx;
> joro@xxxxxxxxxx
> Cc: linux-kernel@xxxxxxxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>;
> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx;
> Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>
> Subject: RE: [PATCH] iommu/arm-smmu-v3: remove the approach of MSI
> polling for CMD SYNC
>
>
>
> > -----Original Message-----
> > From: Robin Murphy [mailto:robin.murphy@xxxxxxx]
> > Sent: Friday, July 17, 2020 8:55 PM
> > To: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>;
> will@xxxxxxxxxx;
> > joro@xxxxxxxxxx
> > Cc: linux-kernel@xxxxxxxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>;
> > linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx;
> > Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>
> > Subject: Re: [PATCH] iommu/arm-smmu-v3: remove the approach of MSI
> > polling for CMD SYNC
> >
> > On 2020-07-17 00:07, Barry Song wrote:
> > > Before commit 587e6c10a7ce ("iommu/arm-smmu-v3: Reduce contention
> > during
> > > command-queue insertion"), msi polling perhaps performed better since
> > > it could run outside the spin_lock_irqsave() while the code polling cons
> > > reg was running in the lock.
> > >
> > > But after the great reorganization of smmu queue, neither of these two
> > > polling methods are running in a spinlock. And real tests show polling
> > > cons reg via sev means smaller latency. It is probably because polling
> > > by msi will ask hardware to write memory but sev polling depends on the
> > > update of register only.
> > >
> > > Using 16 threads to run netperf on hns3 100G NIC with UDP packet size
> > > in 32768bytes and set iommu to strict, TX throughput can improve from
> > > 25227.74Mbps to 27145.59Mbps by this patch. In this case, SMMU is
> super
> > > busy as hns3 sends map/unmap requests extremely frequently.
> >
> > How many different systems and SMMU implementations are those numbers
> > representative of? Given that we may have cases where the SMMU can use
> > MSIs but can't use SEV, so would have to fall back to inefficient
> > busy-polling, I'd be wary of removing this entirely. Allowing particular
> > platforms or SMMU implementations to suppress MSI functionality if they
> > know for sure it makes sense seems like a safer bet.
> >
> Hello Robin,
>
> Thanks for taking a look. Actually I was really struggling with the good way to
> make every platform happy.
> And I don't have other platforms to test and check if those platforms run
> better by sev polling. Even two
> platforms have completely same SMMU features, it is still possible they
> behave differently.
> So I simply sent this patch to get the discussion started to get opinions.
>
> At the first beginning, I wanted to have a module parameter for users to decide
> if msi polling should be disabled.
> But the module parameter might be totally ignored by linux distro.
>
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -418,6 +418,11 @@ module_param_named(disable_bypass,
> disable_bypass, bool, S_IRUGO); MODULE_PARM_DESC(disable_bypass,
> "Disable bypass streams such that incoming transactions from devices
> that are not attached to an iommu domain will report an abort back to the
> device and will not be allowed to pass through the SMMU.");
>
> +static bool disable_msipolling = 1;
> +module_param_named(disable_msipolling, disable_msipolling, bool,
> +S_IRUGO); MODULE_PARM_DESC(disable_msipolling,
> + "Don't use MSI to poll the completion of CMD_SYNC if it is slower than
> +SEV");
> +
> enum pri_resp {
> PRI_RESP_DENY = 0,
> PRI_RESP_FAIL = 1,
> @@ -992,7 +997,7 @@ static void arm_smmu_cmdq_build_sync_cmd(u64
> *cmd, struct arm_smmu_device *smmu,
> * Beware that Hi16xx adds an extra 32 bits of goodness to its MSI
> * payload, so the write will zero the entire command on that platform.
> */
> - if (smmu->features & ARM_SMMU_FEAT_MSI &&
> + if (!disable_msipolling && smmu->features & ARM_SMMU_FEAT_MSI &&
> smmu->features & ARM_SMMU_FEAT_COHERENCY) {
> ent.sync.msiaddr = q->base_dma + Q_IDX(&q->llq, prod) *
> q->ent_dwords * 8;
> @@ -1332,7 +1337,7 @@ static int
> __arm_smmu_cmdq_poll_until_consumed(struct arm_smmu_device *smmu,
> static int arm_smmu_cmdq_poll_until_sync(struct arm_smmu_device *smmu,
> struct arm_smmu_ll_queue *llq)
> {
> - if (smmu->features & ARM_SMMU_FEAT_MSI &&
> + if (!disable_msipolling && smmu->features & ARM_SMMU_FEAT_MSI &&
> smmu->features & ARM_SMMU_FEAT_COHERENCY)
> return __arm_smmu_cmdq_poll_until_msi(smmu, llq);
>
>
> Another option is that we don't use module parameter, alternatively, we check
> the vendor/chip ID,
> if the chip has better performance on sev polling, it may set disable_msipolling
> to true.
>
> You are very welcome to give your suggestions.

A possible way to do some chip-specific configuration would be setting smmu->options according to model ID:

static void acpi_smmu_get_options(u32 model, struct arm_smmu_device *smmu)
{
switch (model) {
case ACPI_IORT_SMMU_V3_CAVIUM_CN99XX:
smmu->options |= ARM_SMMU_OPT_PAGE0_REGS_ONLY;
break;
case ACPI_IORT_SMMU_V3_HISILICON_HI161X:
smmu->options |= ARM_SMMU_OPT_SKIP_PREFETCH;
break;
+ case ACPI_IORT_SMMU_V3_HISILICON_HI162X:
+ smmu->options |= ARM_SMMU_OPT_DISABLE_MSIPOLL;
+ break;
}

dev_notice(smmu->dev, "option mask 0x%x\n", smmu->options);
}

I dumped the model id, but unluckily the id is just zero.

#define ACPI_IORT_SMMU_V3_GENERIC 0x00000000 /* Generic SMMUv3 */

Robin,
would you like to think applying for a new model ID is a right way to set this chip-specific option?

Thanks
Barry