Re: [PATCH v1 0/9] Fix Allwinner D1 boot regression

From: Palmer Dabbelt
Date: Thu Aug 15 2024 - 13:52:09 EST


On Wed, 14 Aug 2024 10:30:48 PDT (-0700), tglx@xxxxxxxxxxxxx wrote:
On Wed, Aug 14 2024 at 16:56, Emil Renner Berthing wrote:
As described in the thread below[1] I haven't been able to boot my
boards based on the Allwinner D1 SoC since 6.9 where you converted the
SiFive PLIC driver to a platform driver.

This is clearly a regression and there haven't really been much progress
on fixing the issue since then, so here is the revert that fixes it.

If no other fix is found before 6.11 I suggest we apply this.

So this mess has been ignored for two month now?

From the pastebin in the initial report:

[ 0.000000] irq: no irq domain found for interrupt-controller@10000000 !
[ 0.000000] Failed to map interrupt for /soc/timer@2050000
[ 0.000000] Failed to initialize '/soc/timer@2050000': -22

This comes back with -EINVAL. So the timer cannot find an interrupt,
which makes it pretty obvious why the system stops to boot, unless there
is some other timer available.

This is obviously related to the SUN4I_TIMER because that message went
away when it was disabled according to the next pastebin.

Obviously that can't work because the SUN4I timer driver is using
timer_of_init() which cannot handle deferred probing.

Daniel: There was a partial fix for the sun4i driver, which you said you
applied:

https://lore.kernel.org/all/20240312192519.1602493-1-samuel.holland@xxxxxxxxxx

But that thing never materialized in a pull request.

And of course everyone involved ignored the problem since March 13th
2024, i.e. almost half a year.

Seriously?

Can you RISCV folks get your act together and ensure to fix things you
broke on the way? Especially when Emil reported this nobody pointed him
to this patch and nobody noticed that it's still not merged?

It took me less than 15 minutes to find that patch and the correlation,
but this is absolutely not my job.

Sorry, I guess I'd just sort of been ignoring the platform-specific side of things because it's so frustrating to deal with, but that's led to a bunch of breakages so it's obviously the wrong thing to do.

I'm seriously grumpy about that. This is not how it works. If you break
stuff, then you take care to fix it before you shove more changes into
the tree and waste my time.

I'm very much inclined to take the reverts right now, send them to Linus
for -rc5 tagged with cc: stable and ignore/nak any irqchip related riscv
patches until the next merge window is over.

Acked-by: Palmer Dabbelt <palmer@xxxxxxxxxxxx>

if you want to take the revert.

IIUC the patch above doesn't actually fix it, that's what led to just sending the reverts -- at least reverts are better than breaking users. I'll post over there too...

And it's no big deal if we're in the doghouse for a bit. Regressions should get fixed faster than this, so we deserve it.

Probably also another sign we're way too focused on getting new features merged, as that's coming at the expense of making existing platforms work. IMO we've been way too focused on getting support for specs that don't even have implementations, and not enough on building real working systems.

Emil, can you give that sun4i fix a test ride please?

Thanks,

tglx