Re: [PATCH v5 2/8] dt-bindings: phy: Add bindings for HiKey 970 PCIe PHY

From: Mauro Carvalho Chehab
Date: Tue Jul 27 2021 - 04:11:32 EST


Hi Mani,

Em Wed, 14 Jul 2021 23:12:25 +0530
Manivannan Sadhasivam <mani@xxxxxxxxxx> escreveu:

> I'm not sure about this. That fact that the PCIe device's PERST# signal
> wired to different GPIOs doesn't mean that those GPIOs belong to the PHY.
> Those GPIOs should be independent of the PCIe core controlled manually
> by the driver.
>
> I think this issue is somewhat similar to the one we are dealing on the
> Qcom platforms [1] where each PCIe device uses a different GPIO and voltage
> config to operate. And those need to be active for the link training to
> succeed.
>
> So perhaps we should aim for a common solution? The GPIO and voltage
> layout should be described in DT for each port exposed by the SoC/board.
>
> Thanks,
> Mani
>
> [1] https://lkml.org/lkml/2021/6/21/1524

After re-visiting this issue, I'm starting to think that this should
be mapped as something similar to:

pcie@xxxx {
...
slot {
slot#1 {
// clock, power supply, reset pins, etc
}
slot#2 {
// clock, power supply, reset pins, etc
}
...
}
};

E. g. placing each specific PCIe device requirement inside the pcie
or phy, as it should be up to the driver to initialize each PCIe
child-specific requirements when the hardware is ready for that.

---

A longer explanation why this should be initialized during PHY
power on sequence:

On my tests with Kirin 970, there are some steps to be done before
enabling the clocks and sending PERST# signals, plus some extra
steps to run after PERST# is sent to all devices.

While playing with PHY split, I noticed that Linux and/or the SoC
is very sensitive to an specific probing order. If such order is
not followed, an ARM SError happens and the Kernel panics
with something similar to:

[ 1.837458] SError Interrupt on CPU0, code 0xbf000002 -- SError
[ 1.837462] CPU: 0 PID: 74 Comm: kworker/0:1 Not tainted 5.8.0+ #205
[ 1.837463] Hardware name: HiKey970 (DT)
[ 1.837465] Workqueue: events deferred_probe_work_func
[ 1.837467] pstate: 20000005 (nzCv daif -PAN -UAO BTYPE=--)
[ 1.837468] pc : _raw_spin_unlock_irqrestore+0x18/0x50
[ 1.837469] lr : regmap_unlock_spinlock+0x14/0x20
...
[ 1.837507] Kernel panic - not syncing: Asynchronous SError Interrupt


One example is with regards to the clocks required for the PCIe
to work:

clocks = <&crg_ctrl HI3670_CLK_GATE_PCIEPHY_REF>,
<&crg_ctrl HI3670_CLK_GATE_PCIEAUX>,
<&crg_ctrl HI3670_PCLK_GATE_PCIE_PHY>,
<&crg_ctrl HI3670_PCLK_GATE_PCIE_SYS>,
<&crg_ctrl HI3670_ACLK_GATE_PCIE>;

If them aren't initialized at the expected order, the Kernel
hangs. The same applies to the slot-specific clocks.

So, basically, the driver needs to initialize them on this
sequence:

1. PHY ref clock;
2. APB sys and phy clock;
3. aclk and aux_clk;
<some settings at the PHY hardware>
4. slot-specific clocks.

failing to follow a valid power-on sequence crashes the Kernel.


Thanks,
Mauro