Re: [Bisected Regression] OLPC XO-1.5: Internal drive and SD card (mmcblk*) gone since commit ea718c699055

From: Rob Herring
Date: Thu Sep 09 2021 - 11:17:38 EST


On Thu, Sep 9, 2021 at 9:09 AM Andre Muller <andre.muller@xxxxxx> wrote:
>
> On 09/09/2021 00.31, Rob Herring wrote:
> > On Tue, Sep 7, 2021 at 10:15 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> >>
> >> On Tue, Sep 7, 2021 at 7:12 PM Andre Muller <andre.muller@xxxxxx> wrote:
> >>>
> >>> On 08/09/2021 00.05, Saravana Kannan wrote:
> >>>> On Sun, Sep 5, 2021 at 1:15 AM Andre Muller <andre.muller@xxxxxx> wrote:
> >>>>>
> >>>>> With linux-5.13 and linux-5.14, the internal drive and SD card reader are gone from the XO-1.5. I bisected the issue to come up with ea718c699055:
> >>>>>
> >>>>> # first bad commit: [ea718c699055c8566eb64432388a04974c43b2ea] Revert "Revert "driver core: Set fw_devlink=on by default""
> >>>>>
> >>>>> The /dev/mmcblk* nodes are not generated since this patch.
> >>>>>
> >>>>> Please find the output of lspsi -vv and lshw below.
> >>>>>
> >>>>> I will be happy to provide more info and/or test patches.
> >>>>
> >>>> Hi Andre,
> >>>>
> >>>> Can you point me to the dts file in upstream that corresponds to this system?
> >>>>
> >>>> Also, if you can give the output of:
> >>>> cat /sys/kernel/debug/devices_deferred
> >>>
> >>> Hi Saravana,
> >>>
> >>>
> >>> /sys/kernel/debug/devices_deferred is empty.
> >>> I used the last good commit b6f617.
> >>
> >> Sorry, I wanted that with the bad commit.
>
> Uh-oh, my bad...
>
> The bad case says
> # cat devices_deferred
> 0000:00:0c.0
>
> That's the SD Host controller.
>
> >>
> >>>
> >>> The XO-1.5 has an x86 compatible VIA C7 processor.
> >>> It uses the VX855 chip for about all I/O tasks, including SDIO.
> >>> I am not aware of a device tree file for it.
> >>>
> >>> It is a bit of a strange beast, it uses OFW to initialize the hardware and provide a FORTH shell.
> >>> Which also is the boot manager, configured via FORTH scripts.
> >>>
> >>> From the linux side of the fence, dmesg's line 2 is:
> >>>
> >>> "OFW detected in memory, cif @ 0xff83ae68 (reserving top 8MB)"
> >>>
> >>> AIUI, this mechanism is used in lieu of a device tree file, like UEFI on most x86 hardware.
> >>> But my understanding of device trees is severely limited, I might be allwrong.
> >>
> >> Uhh... I'm so confused. If Linux doesn't use OF, then none of the code
> >> enabled by fw_devlink=on should be executed.
> >
> > Linux does, but maybe not for memory (like UEFI on arm64).
> >
> >> The only thing that might remotely even execute is:
> >> efifb_add_links() in drivers/firmware/efi/efi-init.c
> >>
> >> If you want you can just do an early return 0; in that to see if it
> >> makes a difference (unlikely).
> >>
> >> Rob, Do you know what's going on with OLPC and DT?
> >
> > Not really. I have an XO-1 DT dump[1]. It's probably a similar looking
> > DT though. It's pretty ancient lacking anything we've invented for DT
> > in the last 10 years. There's not really much to it as about the only
> > phandle I see is for interrupts.
> >
> >>> Anyway, the firmware source is here:
> >>> http://dev.laptop.org/git/users/quozl/openfirmware/
> >>>
> >>> This file is the closest dt-analogous thing for the XO-1.5 I can find therein:
> >>> cpu/x86/pc/olpc/via/devices.fth
> >>
> >> That file is all gibberish to me.
> >
> > Running this on a booted system would help:
> >
> > dtc -f -I fs -O dts /proc/device-tree > dump.dts
>
> Ah, thanks. I never knew about the DT in there...
> XO-1.5_dump.dts is attached.
>
> >
> > If you don't have dtc on the system, then you'll have to zip up
> > /proc/device-tree contents and run dtc elsewhere (or just post that).
> >
> >>> My machine runs the latest version:
> >>> http://wiki.laptop.org/go/OLPC_Firmware_q3c17
> >>>
> >>> The XO-1.5 hardware specs are here:
> >>> http://wiki.laptop.org/images/f/f0/CL1B_Hdwe_Design_Spec.pdf
> >>> http://wiki.laptop.org/go/Hardware_specification_1.5
> >>>
> >>> Would the .config or dmesg help?
> >>
> >> At this point, why not? When you do send them, please send them as
> >> attachments and not inline.
> >>
> >> Also, when you collect the dmesg logs, the following could help:
> >> Enable the existing dev_dbg logs in these functions:
> >> device_link_add()
> >> device_links_check_suppliers()
> >>
> >> And add the following log to fwnode_link_add():
> >> +++ b/drivers/base/core.c
> >> @@ -87,6 +87,8 @@ int fwnode_link_add(struct fwnode_handle *con,
> >> struct fwnode_handle *sup)
> >> goto out;
> >> }
> >>
> >> + pr_info("Link fwnode %pfwP as a consumer of fwnode %pfwP\n", con, sup);
> >> +
> >
>
> OK. The dmesg with debug info is attached as well (for the broken case).

Humm, ACPI and DT together...

Looks to me like it's waiting for the wrong interrupt-parent. The log
says it is waiting for 'interrupt-controller@i20' which is the only
interrupt-controller found in the DT, but the parent is the PCI bridge
with whatever interrupt-map is pointing to. That's not clear as the
phandle (0x767a4) doesn't exist in the DT. I suppose the parent is
defined in ACPI?

If there's not an easy fix, just disable devlinks for x86. There's
only one other DT platform, ce4100, and I really doubt it is even used
at all.

Rob