Re: [PATCH v4] PCI: vmd: Add the module param to adjust MSI mode
From: Xinghui Li
Date: Sun Apr 02 2023 - 11:02:22 EST
On Thu, Mar 30, 2023 at 12:31 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>
> On Wed, Mar 29, 2023 at 04:57:08PM +0800, Xinghui Li wrote:
> > On Wed, Mar 29, 2023 at 5:34 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > It would also be nice to include a hint about why a user would choose
> > > "on" or "off". What is the performance effect? What sort of I/O
> > > scenario would lead you to choose "on" vs "off"?
> > >
> > Before this patch, I sent the patch named :
> > PCI: vmd: Do not disable MSI-X remapping in VMD 28C0 controller
> > (patchwork link:
> > https://patchwork.kernel.org/project/linux-pci/patch/20221222072603.1175248-1-korantwork@xxxxxxxxx/)
> > We found the 4k rand read's iops could drop 50% if 4 NVMEs were
> > mounted in one PCIE port with VMD MSI bypass.
> > I suppose this is because the VMD Controller can aggregate interrupts.
> > But those test result is so long that I didn't add them to this patch
> > commit log.
> > If you believe it is necessary, I will try to add some simple instructions
>
> I don't think we need detailed performance numbers, but we need
> something like:
>
> - "msi_remap=off" improves interrupt handling performance by
> avoiding the VMD MSI-X domain interrupt handler
>
> - But "msi_remap=on" is needed when ...?
>
In the patch I send in last email, We speculate that the VMD
Controller aggregate interrupts,
making the PCIe port less stressed and improving iops. In this
case(lots of 4k random IO), if we enable the VMD MSI
remapping, we found the interrupts from VMD Controller's MSI are less
and the IOPS is much higher.
> > > ee81ee84f873 ("PCI: vmd: Disable MSI-X remapping when possible")
> > > suggests that MSI-X remapping (I assume the "msi_remap=on" case):
> > >
> > > - Limits the number MSI-X vectors available to child devices to the
> > > number of VMD MSI-X vectors.
> > >
> > > - Reduces interrupt handling performance because child device
> > > interrupts have to go through the VMD MSI-X domain interrupt
> > > handler.
> > >
> > > So I assume "msi_remap=off" would remove that MSI-X vector limit and
> > > improve interrupt handling performance?
> > >
> > > But obviously there's more to consider because those are both good
> > > things and if we could do that all the time, we would. So there must
> > > be cases where we *have* to remap. ee81ee84f873 suggests that not all
> > > VMD devices support disabling remap. There's also a hint that some
> > > virt configs require it.
> > >
> > I used to just want to disable 28C0's VMD MSI bypass by default.
> > But Nirmal suggested the current method by adjusting the param.
> > Because he and other reviewers worry there are some other scenarios we
> > didn't consider.
> > Adding a method to adjust VMD'S MSI-X mode is better.
>
> This commit log doesn't outline any of those other scenarios, and it
> doesn't say anything about when "msi_remap=on" or "msi_remap=off"
> would be necessary or desired, so I have no idea how users are
> supposed to figure out whether or not to use this parameter.
>
I do miss this part. I'm sorry I ignored the users outside the
community or this discussion.
I will add the necessary guidelines.
> > > This patch doesn't enforce either of those things. What happens if
> > > the user gets it wrong?
> >
> > If I am wrong, please feel free to correct me at any time.
> > I place the "vmd_config_msi_remap_param" that is VMD MSI-X's mode
> > param configuring helper front
> > "vmd_enable_domain". So, It will not change the logic disabling
> > remapping from ee81ee84f873, such as
> > "Currently MSI remapping must be enabled in guest passthrough mode".
> > So, if the user config the wrong type, it will not work, and they can
> > find it by dmesg.
>
> That's kind of a problem. I'm not in favor of something failing and
> the user having to debug it via dmesg. That causes user frustration
> and problem reports.
What about adding a sysfs node for it in VMD PCI bus dir, which allows
users to catch VMD's MSI current working mode?
>
> I don't know what "guest passthrough mode" is. Can you detect that
> automatically?
I quote this from the commit ee81ee84f873's comment, it can be detected by the
logic like this:
if (!(features & VMD_FEAT_CAN_BYPASS_MSI_REMAP) ||
offset[0] || offset[1])
I just want to answer your comment: "There's also a hint that some
virt configs require it."
This patch will not modify the logic of determining whether MSI
remapping is enabled
when running VMD in Guest.
Thanks~