Re: Boot failure on gru-scarlet-inx with 5.9-rc2

From: Samuel Dionne-Riel
Date: Mon Aug 31 2020 - 03:18:48 EST


On Sun, 30 Aug 2020 10:41:42 +0100
Marc Zyngier <maz@xxxxxxxxxx> wrote:

Hi,

>
> Could you try replacing the problematic patch with [1], and let me
> know whether this changes anything on your end? This patch probably
> isn't the right approach, but it would certainly help pointing me
> in the right direction.
>
> [1]
> https://lore.kernel.org/lkml/20200815125112.462652-2-maz@xxxxxxxxxx/

Following through a bisect session to figure out why the Wi-Fi broke
between 5.8 and 5.9-rc1, I figured out something that you might have in
mind already.

It seems that anything that makes of_bus_pci_match return true will
cause this to happen. This is why your initial fix also fails.

I believe my understanding is right since applying the following on top
of 5.9-rc1 also produces the same result.

--- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
@@ -227,6 +227,7 @@ dmac_peri: dma-controller@ff6e0000 {
};

pcie0: pcie@f8000000 {
+ device_type = "pci";
compatible = "rockchip,rk3399-pcie";
reg = <0x0 0xf8000000 0x0 0x2000000>,
<0x0 0xfd000000 0x0 0x1000000>;


This was found out since the Wi-Fi pci-based ath10k Wi-Fi broke, with
2f96593ecc37e98bf99525f0629128080533867f, which changes stuff around
pci bus... things...

Am I understanding right that your fix(es) were related to the change
set where the commit is found?

My intuition is that the commit causing the boot issue could be related
to changes with PCI or PCIe subsystems, and that your fix for
of_bus_pci_match is a red herring, that only surfaced the existing
issue.

This is backed by applying the previous dts patch on top of 2f96593e,
and having Wi-Fi work. I would assume that between that commit and
5.9-rc1 there is a commit that causes the complete failure to boot,
which is unrelated to the first identified commit on 5.9-rc2.

And backed by a further bisection with this that points to
d84c572de1a360501d2e439ac632126f5facf59d being the actual change that
causes the tablet to fail to boot, as long as the pcie0 node is
identified as pci properly.

I am unsure if I should add as a Cc everyone involved in that change
set, though the author (coincidentally) is already in the original list
of recipients.

Any additional thoughts from this additional information?

--
Samuel Dionne-Riel