Re: Kernel panic while doing vfio-pci hot-plug/unplug test
From: Matthew Wilcox
Date: Wed Oct 23 2019 - 12:38:55 EST
On Wed, Oct 23, 2019 at 10:15:40AM -0500, Bjorn Helgaas wrote:
> I don't like being one of a handful of callers of __add_wait_queue(),
> so I like that solution from that point of view.
>
> The 7ea7e98fd8d0 ("PCI: Block on access to temporarily unavailable pci
> device") commit log suggests that using __add_wait_queue() is a
> significant optimization, but I don't know how important that is in
> practical terms. Config accesses are never a performance path anyway,
> so I'd be inclined to use add_wait_queue() unless somebody complains.
Wow, this has got pretty messy in the umpteen years since I last looked
at it.
Some problems I see:
1. Commit df65c1bcd9b7b639177a5a15da1b8dc3bee4f5fa (tglx) says:
x86/PCI: Select CONFIG_PCI_LOCKLESS_CONFIG
All x86 PCI configuration space accessors have either their own
serialization or can operate completely lockless (ECAM).
Disable the global lock in the generic PCI configuration space accessors.
The concept behind this patch is broken. We still need to lock out
config space accesses when devices are undergoing D-state transitions.
I would suggest that for the contention case that tglx is concerned about,
we should have a pci_bus_read_config_unlocked_##size set of functions
which can be used for devices we know never go into D states.
2. Commit a2e27787f893621c5a6b865acf6b7766f8671328 (jan kiszka)
exports pci_lock. I think this is a mistake; at best there should be
accessors for the pci_lock. But I don't understand why it needs to
exclude PCI config space changes throughout pci_check_and_set_intx_mask().
Why can it not do:
- bus->ops->read(bus, dev->devfn, PCI_COMMAND, 4, &cmd_status_dword);
+ pci_read_config_dword(dev, PCI_COMMAND, &cmd_status_dword);
3. I don't understand why 511dd98ce8cf6dc4f8f2cb32a8af31ce9f4ba4a1
changed pci_lock to be a raw spinlock. The patch description
essentially says "We need it for RT" which isn't terribly helpful.
4. Finally, getting back to the original problem report here, I wouldn't
write this code this way today. There's no reason not to use the
regular add_wait_queue etc. BUT! Why are we using this custom locking
mechanism? It pretty much screams to me of an rwsem (reads/writes
of config space take it for read; changes to config space accesses
(disabling and changing of accessor methods) take it for write.