Re: [PATCH V7 0/3] Generate device tree node for pci devices

From: Rob Herring
Date: Thu Mar 30 2023 - 11:21:53 EST


On Wed, Mar 29, 2023 at 11:51 AM Frank Rowand <frowand.list@xxxxxxxxx> wrote:
>
> On 3/9/23 02:45, Clément Léger wrote:
> > Le Wed, 8 Mar 2023 01:31:52 -0600,
> > Frank Rowand <frowand.list@xxxxxxxxx> a écrit :
> >
> >> On 3/6/23 18:52, Rob Herring wrote:
> >>> On Mon, Mar 6, 2023 at 3:24 PM Frank Rowand <frowand.list@xxxxxxxxx> wrote:
> >>>>
> >>
> >> < snip >
> >>
> >> Hi Rob,
> >>
> >> I am in no position to comment intelligently on your comments until I
> >> understand the SoC on PCI card model I am asking to be described in
> >> this subthread.
> >
> > Hi Frank,
> >
> > Rather than answering all of the assumptions that were made in the upper
> > thread (that are probably doing a bit too much of inference), I will
> > re-explain that from scratch.
>
> Thanks! The below answers a lot of my questions.
> >
> > Our usecase involves the lan966x SoCs. These SoCs are mainly targeting
> > networking application and offers multiple SFP and RGMII interfaces.
> > This Soc can be used in two exclusive modes (at least for the intended
> > usage):
> >
> > SoC mode:
> > The device runs Linux by itself, on ARM64 cores included in the
> > SoC. This use-case of the lan966x is currently almost upstreamed,
> > using a traditional Device Tree representation of the lan996x HW
> > blocks [1] A number of drivers for the different IPs of the SoC have
> > already been merged in upstream Linux (see
> > arch/arm/boot/dts/lan966x.dtsi)
> >
> > PCI mode:
> > The lan966x SoC is configured as a PCIe endpoint (PCI card),
> > connected to a separate platform that acts as the PCIe root complex.
> > In this case, all the IO memories that are exposed by the devices
> > embedded on this SoC are exposed through PCI BARs 0 & 1 and the ARM64
> > cores of the SoC are not used. Since this is a PCIe card, it can be
> > plugged on any platform, of any architecture supporting PCIe.
> >
> > This work only focus on the *PCI mode* usage. In this mode, we have the
> > following prerequisites:
> > - Should work on all architectures (x86, ARM64, etc)
> > - Should be self-contained in the driver
>
> > - Should be able to reuse all existing platform drivers
>
> I've said before (in different words) that using an existing platform
> driver for hardware on a PCI card requires shims, which have been
> strongly rejected by the Linux kernel.

Do you have an example of what you are saying has been rejected
because I have no clue what you are referring to?

The kernel already has a way to divide up a PCI device into multiple
non-PCI sub-drivers. It's auxiliary_bus. Is that not a "shim"? So why
not use that? From the docs:

* A key requirement for utilizing the auxiliary bus is that there is no
* dependency on a physical bus, device, register accesses or regmap support.
* These individual devices split from the core cannot live on the platform bus
* as they are not physical devices that are controlled by DT/ACPI. The same
* argument applies for not using MFD in this scenario as MFD relies on
* individual function devices being physical devices.


In the usecases here, they are physical devices because it's the same
devices when Linux is running on the SoC.

> >
> > In PCI mode, the card runs a firmware (not that it matters at all by
> > the way) which configure the card in PCI mode at boot time. In this
> > mode, it exposes a single PCI physical function associated with
> > vendor/product 0x1055/0x9660. This is not a multi-function PCI device !
> > This means that all the IO memories (peripheral memories, device
> > memories, registers, whatever you call them) are accessible using
> > standard readl()/writel() on the BARs that have been remapped. For
> > instance (not accurate), in the BAR 0, we will have this kind of memory
> > map:
> >
> > BAR0
> > 0x0 ┌───────────┐
> > │ │
> > ├───────────┤
> > │ Clock │
> > │ controller│
> > ├───────────┤
> > │ │
> > ├───────────┤
> > │ I2C │
> > │ controller│
> > ├───────────┤
> > │ │
> > ├───────────┤
> > │ MDIO │
> > │ Controller│
> > ├───────────┤
> > │ │
> > ├───────────┤
> > │ Switch │
> > │ Controller│
> > ├───────────┤
> > │ │
> > │ ... │
> >
> >
> > It also exposes either a single interrupt via the legacy interrupt
> > (which can then be demuxed by reading the SoC internal interrupt
> > controller registers), or multiple interrupts using MSI interrupts.
> >
> > As stated before, all these peripherals are already supported in SoC
> > mode and thus, there are aleready existing platform drivers for each of
> > them. For more information about the devices that are exposed please
> > see link [1] which is the device-tree overlay used to describe the
> > lan9662 card.
> >
> > In order to use the ethernet switch, we must configure everything that
> > lies around this ethernet controller, here are a few amongst all of
> > them:
> > - MDIO bus
> > - I2C controller for SFP modules access
> > - Clock controller
> > - Ethernet controller
> > - Syscon
> >
> > Since all the platform drivers already exist for these devices, we
> > want to reuse them. Multiple solutions were thought of (fwnode, mfd,
> > ACPI, device-tree) and eventually ruled out for some of them and efforts
> > were made to try to tackle that (using fwnode [2], device-tree [3])
> >
>
> > One way to do so is to use a device-tree overlay description that is
> > loaded dynamically on the PCI device OF node. This can be done using the
> > various device-tree series series that have been proposed (included
> > this one). On systems that do not provide a device-tree of_root, create
> > an empty of_root node (see [4]). Then during PCI enumeration, create
> > device-tree node matching the PCI tree that was enumerated (See [5]).
> > This is needed since the PCI card can be plugged on whatever port the
> > user wants and thus it can not be statically described using a fixed
> > "target-path" property in the overlay.
>
> I understand that this is a use case and a desire to implement a
> solution for the use case. But this is a very non-standard model.
> The proposal exposes a bunch of hardware beyond the pci interface
> in a non-pci method.

It is not the proposal that exposes a bunch of hardware. This device
exposes a bunch of hardware. As you say, it is *beyond the PCI
interface*, so it has zero to do with PCI.

We already support non-discoverable h/w behind a PCI bus. It's called
ISA. There's powerpc DT files in the tree with ISA devices.

> No, just no. Respect the pci interface boundary and do not drag
> devicetree into an effort to pierce and straddle that boundary
> (by adding information about the card, beyond the PCI controller,
> into the system devicetree). Information about dynamically
> discoverable hardware does not belong in the devicetree.

What is discoverable? Nothing more than a VID/PID.

Your suggestion is simply use the VID/PID(s) and then the PCI driver
for the card will have all the details that implies. There's a name
for that: board files. Just like we had a single machine ID per board
registered with RMK and the kernel had to contain all the
configuration details for each machine ID. It's not just 1 card here.
This is a chip and I imagine what's used or not used or how the
downstream peripherals are configured all depend on the customer and
their specific designs.

If DT is not used here, then swnodes (DT bindings embedded in the
kernel or platform_data 2.0) will be. It's exactly the same structure
in the kernel. It's still going to be non-PCI drivers for all the sub
devices. The change in the drivers will not be making them PCI
drivers, but simply converting DT APIs to fwnode APIs which is
pointless churn IMO.

Finally, let's not forget the FPGA usecase. We already support DT
overlays for FPGAs. The fact that the FPGA sits behind a PCI interface
is pretty much irrelevant to the problem. There is simply no way the
kernel is going to contain information about what's within the FPGA
image. If not DT, how should we communicate the FPGA configuration to
the kernel?

Rob