Re: Kernel panic while doing vfio-pci hot-plug/unplug test

From: Thomas Gleixner
Date: Thu Oct 24 2019 - 04:07:30 EST

Next message: Hans de Goede: "Re: [PATCH resend] Add touchscreen platform data for the Schneider SCT101CTM tablet"
Previous message: Omer Shpigelman: "RE: [PATCH v2 1/2] habanalabs: support kernel memory mapping"
Next in thread: Matthew Wilcox: "Re: Kernel panic while doing vfio-pci hot-plug/unplug test"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 23 Oct 2019, Matthew Wilcox wrote:
> Some problems I see:
>
> 1. Commit df65c1bcd9b7b639177a5a15da1b8dc3bee4f5fa (tglx) says:
>
> x86/PCI: Select CONFIG_PCI_LOCKLESS_CONFIG
>
> All x86 PCI configuration space accessors have either their own
> serialization or can operate completely lockless (ECAM).
>
> Disable the global lock in the generic PCI configuration space accessors.
>
> The concept behind this patch is broken. We still need to lock out
> config space accesses when devices are undergoing D-state transitions.
> I would suggest that for the contention case that tglx is concerned about,
> we should have a pci_bus_read_config_unlocked_##size set of functions
> which can be used for devices we know never go into D states.

I don't think that it's broken. A D-state transition has to make sure that
the rest of stuff which might be touching the config space is
quiescent. pci_lock cannot provide that protection

>
> 2. Commit a2e27787f893621c5a6b865acf6b7766f8671328 (jan kiszka)
> exports pci_lock. I think this is a mistake; at best there should be
> accessors for the pci_lock. But I don't understand why it needs to
> exclude PCI config space changes throughout pci_check_and_set_intx_mask().
> Why can it not do:
>
> - bus->ops->read(bus, dev->devfn, PCI_COMMAND, 4, &cmd_status_dword);
> + pci_read_config_dword(dev, PCI_COMMAND, &cmd_status_dword);

Hmm. Need to look closer on that.

> 3. I don't understand why 511dd98ce8cf6dc4f8f2cb32a8af31ce9f4ba4a1
> changed pci_lock to be a raw spinlock. The patch description
> essentially says "We need it for RT" which isn't terribly helpful.

Yes, I could slap myself for writing such a useless changelog. The reason
why it is a raw spinlock is that config space access happens from very low
level contexts, which require to have interrupts disabled even on RT,
e.g. from the guts of the interrupt code.

> 4. Finally, getting back to the original problem report here, I wouldn't
> write this code this way today. There's no reason not to use the
> regular add_wait_queue etc. BUT! Why are we using this custom locking
> mechanism? It pretty much screams to me of an rwsem (reads/writes
> of config space take it for read; changes to config space accesses
> (disabling and changing of accessor methods) take it for write.

You cannot use a RWSEM as low level interrupt code needs to access the
config space with interrupts disabled and raw spinlocks held, e.g. to
fiddle with the interrupt and MSI stuff.

Thanks,

tglx

Next message: Hans de Goede: "Re: [PATCH resend] Add touchscreen platform data for the Schneider SCT101CTM tablet"
Previous message: Omer Shpigelman: "RE: [PATCH v2 1/2] habanalabs: support kernel memory mapping"
Next in thread: Matthew Wilcox: "Re: Kernel panic while doing vfio-pci hot-plug/unplug test"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]