On 13/09/22, 7:05 PM, "Vitaly Kuznetsov" <vkuznets@xxxxxxxxxx> wrote:Thanks Vitaly, for your review and helping us to improve the code.
Thanks Vitaly for your response.I'm not sure I follow, you have two instances of 'struct pci_raw_ops'
1. we have multiple objects of struct pci_raw_ops, 2. adding 'priority' field to struct pci_raw_ops
doesn't seems to be appropriate as need to take decision which object of struct pci_raw_ops has
to be used, not something with-in struct pci_raw_ops.
which are called 'raw_pci_ops' and 'raw_pci_ext_ops'. What if you do
something like (completely untested):
diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
index 70533fdcbf02..fb8270fa6c78 100644
--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -116,6 +116,7 @@ extern void (*pcibios_disable_irq)(struct pci_dev *dev);
extern bool mp_should_keep_irq(struct device *dev);
struct pci_raw_ops {
+ int rating;
int (*read)(unsigned int domain, unsigned int bus, unsigned int devfn,
int reg, int len, u32 *val);
int (*write)(unsigned int domain, unsigned int bus, unsigned int devfn,
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index ddb798603201..e9965fd11576 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -40,7 +40,8 @@ const struct pci_raw_ops *__read_mostly raw_pci_ext_ops;
int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
int reg, int len, u32 *val)
{
- if (domain == 0 && reg < 256 && raw_pci_ops)
+ if (domain == 0 && reg < 256 && raw_pci_ops &&
+ (!raw_pci_ext_ops || raw_pci_ext_ops->rating <= raw_pci_ops->rating))
return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
if (raw_pci_ext_ops)
return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
@@ -50,7 +51,8 @@ int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn,
int reg, int len, u32 val)
{
- if (domain == 0 && reg < 256 && raw_pci_ops)
+ if (domain == 0 && reg < 256 && raw_pci_ops &&
+ (!raw_pci_ext_ops || raw_pci_ext_ops->rating <= raw_pci_ops->rating))
return raw_pci_ops->write(domain, bus, devfn, reg, len, val);
if (raw_pci_ext_ops)
return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, val);
and then somewhere in Vmware hypervisor initialization code
(arch/x86/kernel/cpu/vmware.c) you do
raw_pci_ext_ops->rating = 100;
I was working to make changes as you suggested, but before sending v3 would like to
discuss on following:
If we add rating with-in struct pci_raw_ops then we can't have pci_mmcfg as const,
and following change is must in arch/x86/pci/mmconfig_64.c:
-const struct pci_raw_ops pci_mmcfg = {
+struct pci_raw_ops pci_mmcfg = {
.read = pci_mmcfg_read,
.write = pci_mmcfg_write,
};
So to avoid this change, is it fine to have global bool prefer_raw_pci_ext_ops?
And raw_pci_read() will have following change:
- if (domain == 0 && reg < 256 && raw_pci_ops)
+ if (domain == 0 && reg < 256 && raw_pci_ops &&
+ (!prefer_raw_pci_ext_ops || !raw_pci_ext_ops)
why wouldn't it work?Ok, as of now we will make this change specific to VMware hypervisor.
(diclaimer: completely untested, raw_pci_ops/raw_pci_ext_ops
initialization has to be checked so 'rating' is not garbage).
It's a generic solution for all hypervisor (sorry for earlier wrongThat's the tricky part. We can check modern hypervisor versions, but
Subject), not specific to VMware. Further looking for feedback if it's
impacting to any hypervisor.
what about all other versions in existence? How can we know that there's
no QEMU/Hyper-V/... version out there where MMIO path is broken? I'd
suggest we limit the change to Vmware hypervisor, other hypervisors may
use the same mechanism (like the one above) later (but the person
suggesting the patch is always responsible for the research why it is
safe to do so).