Re: [PATCH] PCI: xgene: Fix IB window setup

From: Thorsten Leemhuis
Date: Sun Feb 06 2022 - 04:52:22 EST


[TLDR: I'm adding the regression report below to regzbot, the Linux
kernel regression tracking bot; nearly all text you find below is
compiled from a few templates paragraphs you likely have encountered
already already from mails similar to this one.]

Hi, this is your Linux kernel regression tracker speaking.

CCing the regression mailing list, as it should be in the loop for all
regressions, as explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html

On 05.02.22 00:01, dann frazier wrote:
> On Mon, Nov 29, 2021 at 11:36:37AM -0600, Rob Herring wrote:
>> Commit 6dce5aa59e0b ("PCI: xgene: Use inbound resources for setup")
>> broke PCI support on XGene. The cause is the IB resources are now sorted
>> in address order instead of being in DT dma-ranges order. The result is
>> which inbound registers are used for each region are swapped. I don't
>> know the details about this h/w, but it appears that IB region 0
>> registers can't handle a size greater than 4GB. In any case, limiting
>> the size for region 0 is enough to get back to the original assignment
>> of dma-ranges to regions.
>
> hey Rob!
>
> I've been seeing a panic on HP Moonshoot m400 cartridges (X-Gene1) -
> only during network installs - that I also bisected down to commit
> 6dce5aa59e0b ("PCI: xgene: Use inbound resources for setup"). I was
> hoping that this patch that fixed the issue on Stéphane's X-Gene2
> system would also fix my issue, but no luck. In fact, it seems to just
> makes it fail differently. Reverting both patches is required to get a
> v5.17-rc kernel to boot.
>
> I've collected the following logs - let me know if anything else would
> be useful.
>
> 1) v5.17-rc2+ (unmodified):
> http://dannf.org/bugs/m400-no-reverts.log
> Note that the mlx4 driver fails initialization.
>
> 2) v5.17-rc2+, w/o the commit that fixed Stéphane's system:
> http://dannf.org/bugs/m400-xgene2-fix-reverted.log
> Note the mlx4 MSI-X timeout, and later panic.
>
> 3) v5.17-rc2+, w/ both commits reverted (works)
> http://dannf.org/bugs/m400-both-reverted.log

Thanks for the report.

To be sure this issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, my Linux kernel regression tracking bot:

#regzbot ^introduced c7a75d07827a1f33d
#regzbot title Follow-up error for the commit fixing "PCIe regression on
APM Merlin (aarch64 dev platform) preventing NVME initialization"
#regzbot ignore-activity

Reminder for developers: when fixing the issue, please add a 'Link:'
tags pointing to the report (the mail quoted above) using the
lore.kernel.org/r/, as explained in
'Documentation/process/submitting-patches.rst' and
'Documentation/process/5.Posting.rst', as this allows the bot to assign
any fixes posted or commited with the report to always show the current
status of things and automatically close the issue when the fix hits the
right tree.

I'm sending this to everyone that got the initial report, to make them
aware of the tracking. I also hope that messages like this motivate
people to directly get at least the regression mailing list and ideally
even regzbot involved when dealing with regressions, as messages like
this wouldn't be needed then.

Don't worry, I'll send further messages wrt to this regression just to
the lists (with a tag in the subject so people can filter them away), if
they are relevant just for regzbot. With a bit of luck no such messages
will be needed anyway.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them and lack
knowledge about most of the areas they concern. I thus unfortunately
will sometimes get things wrong or miss something important. I hope
that's not the case here; if you think it is, don't hesitate to tell me
in a public reply, it's in everyone's interest to set the public record
straight.

--
Additional information about regzbot:

If you want to know more about regzbot, check out its web-interface, the
getting start guide, and the references documentation:

https://linux-regtracking.leemhuis.info/regzbot/
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md

The last two documents will explain how you can interact with regzbot
yourself if your want to.

Hint for reporters: when reporting a regression it's in your interest to
CC the regression list and tell regzbot about the issue, as that ensures
the regression makes it onto the radar of the Linux kernel's regression
tracker -- that's in your interest, as it ensures your report won't fall
through the cracks unnoticed.

Hint for developers: you normally don't need to care about regzbot once
it's involved. Fix the issue as you normally would, just remember to
include 'Link:' tag in the patch descriptions pointing to all reports
about the issue. This has been expected from developers even before
regzbot showed up for reasons explained in
'Documentation/process/submitting-patches.rst' and
'Documentation/process/5.Posting.rst'.