Re: [PATCH v3 1/1] PCI: Fix bug resulting in double hpmemsize being assigned to MMIO window

From: Nicholas Johnson
Date: Thu Nov 21 2019 - 09:52:51 EST


On Tue, Nov 19, 2019 at 07:38:28AM -0600, Bjorn Helgaas wrote:
> On Tue, Nov 19, 2019 at 03:17:04AM +0000, Nicholas Johnson wrote:
> > I did just discover linux-next and I built it. Should I be doing this
> > more often to help find regressions?
>
> Yes, if you build and run linux-next, that's a great service because
> it helps find problems before they appear in mainline.

Funnily enough, I just built Linux next-20191121 and it has a NULL
dereference on start-up, which renders the system unusable.

Can anybody else please confirm? I enabled most of the new options since
the last linux-next a few days before.

I did just compile on an i7-4770K using my USB SSD to boot. I suppose
there is a tiny chance that the CPU had an error and produced bad code.
It is not my machine. It was pegged at 100 degrees Celsius the whole
time.... I do find it hard to believe that I am the first to notice it,
though. I cannot find any bug reports on this.

If this turns out to be an actual bug, is there a preferred way to
report it? It is probably not from pci subsystem.

I can do a bisect, but they consume a lot of time on a slow system.

Here is a preliminary bug report (assuming you are meant to report
linux-next bugs here):
https://bugzilla.kernel.org/show_bug.cgi?id=205621

Cheers!

Regards,
Nicholas Johnson

>
> > I will now concentrate on fixing the problem where pci=nocrs does not
> > ignore the bus resource. One motherboard I own gives 00-7e or similar,
> > instead of 00-ff. The nocrs does not help, and I had to patch the kernel
> > myself. Only acpi=off fixes the problem, while knocking out SMT (MADT),
> > IOMMU (DMAR) and the ability to suspend without crashing.
> >
> > If you disagree that nocrs should override bus resource, then let me
> > know and I will not attempt this.
>
> I guess the problem is that with "pci=nocrs", we ignore the MMIO and
> I/O port resources from _CRS, but we still pay attention to bus number
> resources in _CRS? That does sound like it would be unexpected
> behavior.
>
> I *would* like to see the complete dmesg log because these _CRS
> methods are pretty reliable because Windows relies on them as well, so
> problems are frequently a result of Linux defects. If we can fix
> Linux or automatically work around issues so users don't have to use
> "pci=nocrs" explicitly, that's the best.
>
> Bjorn