Re: Boot failure on gru-scarlet-inx with 5.9-rc2
From: Samuel Dionne-Riel
Date: Tue Sep 01 2020 - 14:34:24 EST
On Tue, 1 Sep 2020 17:42:49 +0100
Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx> wrote:
> On Tue, Sep 01, 2020 at 04:37:42PM +0100, Marc Zyngier wrote:
> > On 2020-09-01 04:45, Samuel Dionne-Riel wrote:
> > > - if (pci_is_root_bus(bus->parent) && dev > 0)
> > > + if (bus->primary == rockchip->root_bus_nr && dev > 0)
>
> Can you dump bus->primary when this condition is hit please ?
With the following diff
---
@@ -79,6 +79,8 @@ static int rockchip_pcie_valid_device(struct rockchip_pcie *rockchip,
* do not read more than one device on the bus directly attached
* to RC's downstream side.
*/
+ printk("[!!] // bus->parent (%d)\n", bus->parent);
+ printk("[!!] bus->primary (%d) == rockchip->root_bus_nr (%d) && dev (%d) > 0\n", bus->primary, rockchip->root_bus_nr, dev);
if (bus->primary == rockchip->root_bus_nr && dev > 0)
return 0;
--
I get two kind of results
[ 1.692913] [!!] // bus->parent (0)
[ 1.692917] [!!] bus->primary (0) == rockchip->root_bus_nr (0) && dev (0) > 0
and
[ 1.693055] [!!] // bus->parent (-256794624)
[ 1.693058] [!!] bus->primary (0) == rockchip->root_bus_nr (0) && dev (0) > 0
What I understand from my logging here is that in some situations the
new check from d84c572d checks against two different values, while
before d84c572d all checks at this location were equivalent.
Note that I have no idea what bus->parent's semantics are compared to
the other condition. I only thought about logging all the information
relevant to this condition.
> Also on a working system (ie prior to regression) please drop the
> output of:
>
> lspci -t
>
$ lspci -t
-[0000:00]---00.0-[01]----00.0
> > >
> > > + /* HACK; ~equiv to last param of
> > > pci_parse_request_of_pci_ranges */
> > > + bus_res = (resource_list_first_type(&bridge->windows,
> > > IORESOURCE_MEM))->res;
>
> IORESOURCE_MEM ? I am a bit puzzled by this hack, what is it supposed
> to do ?
It's not really supposed to do anything. I only needed access to
bus_res for bus_res->start to keep as root_bus_nr. My complete lack of
familiarity with all of this meant that I simply borrowed something
that was in use in another function to give me the bus_res.
Note that, while this hack is ugly, this was at first tested directly
against d84c572d, no hack needed. Against d84c572d, reverting the
change for the second call to pci_is_root_bus only made it work fine.
This is the equivalent (possibly bad) change.
> > Hmmm. It seems that the original commit (d84c572d) considered that
> > root_bus_nr was always zero, while it may not have been.
> >
> > Rob, Lorenzo: do you guys have any idea what is going on here?
>
> That's a possibility - it would also be useful to have a look at
> the DTS to check the bus-range property.
The DTS is arch/arm64/boot/dts/rockchip/rk3399-gru-scarlet-inx.dts,
systematically synced-up with the kernel source used to build the
running kernel.
Which, AFAICT, only has a bus-range set in
arch/arm64/boot/dts/rockchip/rk3399.dtsi.
pcie0: pcie@f8000000 {
[...]
bus-range = <0x0 0x1f>;
[...]
}
Thanks again!
--
Samuel Dionne-Riel