RE: [PATCH] PCI: hv: Do not set PCI_COMMAND_MEMORY to reduce VM boot time

From: Dexuan Cui
Date: Fri Apr 29 2022 - 11:47:33 EST

> From: Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx>
> Sent: Friday, April 29, 2022 3:15 AM
> To: Dexuan Cui <decui@xxxxxxxxxxxxx>
> Cc: wei.liu@xxxxxxxxxx; KY Srinivasan <kys@xxxxxxxxxxxxx>; Haiyang Zhang
> <haiyangz@xxxxxxxxxxxxx>; Stephen Hemminger <sthemmin@xxxxxxxxxxxxx>;
> bhelgaas@xxxxxxxxxx; linux-hyperv@xxxxxxxxxxxxxxx;
> linux-pci@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Michael Kelley (LINUX)
> <mikelley@xxxxxxxxxxxxx>; robh@xxxxxxxxxx; kw@xxxxxxxxx; Jake Oshins
> <jakeo@xxxxxxxxxxxxx>
> Subject: Re: [PATCH] PCI: hv: Do not set PCI_COMMAND_MEMORY to reduce
> VM boot time
> On Tue, Apr 19, 2022 at 03:00:07PM -0700, Dexuan Cui wrote:
> > A VM on Azure can have 14 GPUs, and each GPU may have a huge MMIO
> BAR,
> > e.g. 128 GB. Currently the boot time of such a VM can be 4+ minutes, and
> > most of the time is used by the host to unmap/map the vBAR from/to pBAR
> > when the VM clears and sets the PCI_COMMAND_MEMORY bit: each
> unmap/map
> > operation for a 128GB BAR needs about 1.8 seconds, and the pci-hyperv
> > driver and the Linux PCI subsystem flip the PCI_COMMAND_MEMORY bit
> > eight times (see pci_setup_device() -> pci_read_bases() and
> > pci_std_update_resource()), increasing the boot time by 1.8 * 8 = 14.4
> > seconds per GPU, i.e. 14.4 * 14 = 201.6 seconds in total.
> >
> > Fix the slowness by not turning on the PCI_COMMAND_MEMORY in
> pci-hyperv.c,
> > so the bit stays in the off state before the PCI device driver calls
> > pci_enable_device(): when the bit is off, pci_read_bases() and
> > pci_std_update_resource() don't cause Hyper-V to unmap/map the vBARs.
> > With this change, the boot time of such a VM is reduced by
> > 1.8 * (8-1) * 14 = 176.4 seconds.
> I believe you need to clarify this commit message. It took me a while
> to understand what you are really doing.
> What this patch is doing is bootstrapping PCI devices with command
> memory clear because there is no need to have it set (?) in the first
> place and because, if it is set, the PCI core layer needs to toggle it
> on and off in order to eg size BAR regions, which causes the slow down
> you are reporting.
> I assume, given the above, that there is strictly no need to have
> devices with command memory set at kernel startup handover and if
> there was it would not be set in the PCI Hyper-V host controller
> driver (because that's what you are _removing_).
> I think this should not be merged as a fix and I'd be careful
> about possible regressions before sending it to stable kernels,
> if you wish to do so.
> It is fine by me to go via the Hyper-V tree even though I don't
> see why that's better, unless you want to send it as a fix and
> I think you should not.
> You can add my tag but the commit log should be rewritten and
> you should add a Link: to the discussion thread.
> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx>

Thanks, Lorenzo! I'll post v2 with the commit message revised.
It's ok to me to have this patch go through the hyperv-next branch
rather than hyperv-fixes.