Re: [PATCH] KVM: x86: Use MMCONFIG for all PCI config space accesses

From: Julia Suvorova
Date: Fri Jul 31 2020 - 14:23:55 EST


On Fri, Jul 31, 2020 at 11:22 AM Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote:
>
> Andy Shevchenko <andy.shevchenko@xxxxxxxxx> writes:
>
> > On Thu, Jul 30, 2020 at 10:37 PM Julia Suvorova <jusual@xxxxxxxxxx> wrote:
> >>
> >> Using MMCONFIG instead of I/O ports cuts the number of config space
> >> accesses in half, which is faster on KVM and opens the door for
> >> additional optimizations such as Vitaly's "[PATCH 0/3] KVM: x86: KVM
> >> MEM_PCI_HOLE memory":
> >
> >> https://lore.kernel.org/kvm/20200728143741.2718593-1-vkuznets@xxxxxxxxxx
> >
> > You may use Link: tag for this.
> >
> >> However, this change will not bring significant performance improvement
> >> unless it is running on x86 within a hypervisor. Moreover, allowing
> >> MMCONFIG access for addresses < 256 can be dangerous for some devices:
> >> see commit a0ca99096094 ("PCI x86: always use conf1 to access config
> >> space below 256 bytes"). That is why a special feature flag is needed.
> >>
> >> Introduce KVM_FEATURE_PCI_GO_MMCONFIG, which can be enabled when the
> >> configuration is known to be safe (e.g. in QEMU).
> >
> > ...
> >
> >> +static int __init kvm_pci_arch_init(void)
> >> +{
> >> + if (raw_pci_ext_ops &&
> >> + kvm_para_has_feature(KVM_FEATURE_PCI_GO_MMCONFIG)) {
> >
> > Better to use traditional pattern, i.e.
> > if (not_supported)
> > return bail_out;
> >
> > ...do useful things...
> > return 0;
> >
> >> + pr_info("PCI: Using MMCONFIG for base access\n");
> >> + raw_pci_ops = raw_pci_ext_ops;
> >> + return 0;
> >> + }
> >
> >> + return 1;
> >
> > Hmm... I don't remember what positive codes means there. Perhaps you
> > need to return a rather error code?
>
> If I'm reading the code correctly,
>
> pci_arch_init() has the following:
>
> if (x86_init.pci.arch_init && !x86_init.pci.arch_init())
> return 0;
>
>
> so returning '1' here means 'continue' and this seems to be
> correct. (E.g. Hyper-V's hv_pci_init() does the same). What I'm not sure
> about is 'return 0' above as this will result in skipping the rest of
> pci_arch_init(). Was this desired or should we return '1' in both cases?

This is intentional because pci_direct_init() is about to overwrite
raw_pci_ops. And since QEMU doesn't have anything in
pciprobe_dmi_table, it is safe to skip it.

Best regards, Julia Suvorova.