Re: [6.12.y regression] Regression with 58130e7ce6cb ("PCI/ERR: Ensure error recoverability at all times"): echo vfio-pci >driver_override does not work for DVB Adapter
From: Lukas Wunner
Date: Wed Apr 01 2026 - 00:12:02 EST
On Tue, Mar 31, 2026 at 05:01:49PM -0600, Alex Williamson wrote:
> On Tue, 31 Mar 2026 15:09:34 +0200 Lukas Wunner <lukas@xxxxxxxxx> wrote:
> > On Mon, Mar 30, 2026 at 08:14:53AM +0200, Bernd Schumacher wrote:
> > > [ 0.318903] pci 0000:07:00.0: [dd01:0003] type 00 class 0x048000 PCIe Endpoint
> > > [ 0.318939] pci 0000:07:00.0: BAR 0 [mem 0xfffffffffc500000-0xfffffffffc50ffff 64bit]
> >
> > BIOS initially sets the BAR address to an incorrect value (the top 32 bits
> > should be all zeroes instead of all ones)...
> >
> > > [ 0.339685] pci 0000:07:00.0: BAR 0 [mem 0xfffffffffc500000-0xfffffffffc50ffff 64bit]: can't claim; no compatible bridge window
> > [...]
> > > [ 0.311065] pci 0000:02:03.0: [1022:57a3] type 01 class 0x060400 PCIe Switch Downstream Port
> > > [ 0.311107] pci 0000:02:03.0: PCI bridge to [bus 07]
> > > [ 0.311118] pci 0000:02:03.0: bridge window [mem 0xfc500000-0xfc5fffff]
> >
> > ... this doesn't fit into the window of the bridge above the DVB card,
> > which has the top 32 bits set to all zeroes...
> >
> > > [ 0.357346] pci 0000:07:00.0: BAR 0 [mem 0xfc500000-0xfc50ffff 64bit]: assigned
> >
> > ... the kernel fixes the incorrect BAR, but it seems there's an ordering
> > issue such that pci_save_state() is called beforehand. It's weird that
> > this doen't occur with newer kernels and it would be good to understand why.
> > I'm not seeing the ordering issue despite staring at the code for a while.
>
> Do we know this isn't occurring on newer kernels?
Yes, the reporter tested 6.19.8 and the issue does not occur there:
https://bugs.debian.org/1131025
> AIUI, we're saving the state via the call chain invoked by
> subsys_initcall(pcibios_init), but I think we're doing the resource
> fixes in fs_initcall(pcibios_assign_resources). That suggests that
> the saved state would have the bogus BAR values.
Hm, seems like a valid observation.
But a call to pci_bus_add_devices() is generally preceded by a call to
pci_assign_unassigned_root_bus_resources(), see e.g. pci_host_probe()
or acpi_pci_root_add(). The latter is what's usually used on x86,
whereas pcibios_init() (actually I think you meant pci_subsys_init())
is for legacy PCI initialization on x86.
Perhaps you're right and the correction of the BAR value happens in
the fs_initcall. We should be able to confirm that once the reporter
has tested the debug patch I provided, which inserts a dump_stack()
in the BAR correction codepath as well as in pci_save_state().
> If we toss PM runtime into that mix, pci_pm_default_resume_early() will
> call pci_restore_state() however pci_save_state() in that file is
> mostly wrapped around pci_dev->state_saved guards.
The state_saved guards only serve the purpose of recognizing whether
the driver called pci_save_state() on suspend. If it did not,
the PCI core calls pci_save_state().
Thanks for taking a look!
Lukas