Re: [PATCH v1 0/9] Fix Allwinner D1 boot regression

From: Palmer Dabbelt
Date: Thu Aug 15 2024 - 13:52:06 EST


On Thu, 15 Aug 2024 08:59:37 PDT (-0700), samuel.holland@xxxxxxxxxx wrote:
Hi Emil,

On 2024-08-15 10:07 AM, Emil Renner Berthing wrote:
Samuel Holland wrote:
On 2024-08-15 9:16 AM, Anup Patel wrote:
On Thu, Aug 15, 2024 at 7:41 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:

On Thu, Aug 15 2024 at 08:32, Samuel Holland wrote:
On 2024-08-15 8:16 AM, Thomas Gleixner wrote:
Yes. So the riscv timer is not working on this thing or it stops
somehow.

That's correct. With the (firmware) devicetree that Emil is using, the OpenSBI
firmware does not have a timer device, so it does not expose the (optional[1])
SBI time extension, and sbi_set_timer() does nothing.

Sigh. Does RISCV really have to repeat all mistakes which have been made
by x86, ARM and others before? It's known for decades that the kernel
relies on a working timer...

It's even worse than that: RISC-V doesn't even mandate any working _instructions_, much less anything in the platform/firmware.

My apologies for the delay in finding a fix for this issue.

Almost all RISC-V platforms (except this one) have SBI Timer always
available and Linux uses a better timer or Sstc extension whenever
it is available.

So this is the immediate solution: add the CLINT to the firmware devicetree so
that the SBI time extension works, and Linux will boot without any code changes,
albeit with a higher-overhead clockevent device.

But this will mean that you can't update your kernel to v6.9 or newer without
reflashing OpenSBI and u-boot. That's still a regression right?

Ya, I'd call that a regression. Updating the firmware on these things isn't generally something we can rely on users to do, we've worked around other firmware bugs where we can to avoid forced updates.

I suppose that depends on if you think the SBI time extension is (or should have
been) mandatory for platforms without Sstc. If the SBI time extension is
mandatory, then this is a firmware bug, and not really Linux's responsibility to
work around.

If the SBI time extension is not mandatory, then Linux needs to be able to
handle platforms where the S-mode visible timer is attached to an external
interrupt controller (PLIC or APLIC), so the irqchip driver needs to be loaded
before time_init() (timer_probe()). So in that case, the bug is a Linux
regression, and we would need to revert the platform driver conversion.

It doesn't really matter what the specs say (aka intended to say in RISC-V land): if there's a regression then we have to deal with it. It's not like whatever's written in the specs actually matters, vendors can just do whatever they want, so wer'e just stuck making the known implementations work.

So I think if the revert is the best fix then we should revert it.

That said: If the CLINT works, could we just add a probing quirk to make it appear on these systems even when it's not in the DT? I'm thinking something like adding a compatibly string to the CLINT driver for the SOC (or core or whatever, just something that's already there). We'd probably need a bit of special-case probing code, but shouldn't be so bad. We've got some other compatibility-oriented DT quirks floating around.

Regards,
Samuel