Re: [PATCH] powerpc/chrp: Revert "Move PHB discovery" and "Make hydra_init() static"

From: Oliver O'Halloran
Date: Sat Jul 17 2021 - 11:54:38 EST


On Sat, Jul 17, 2021 at 8:12 AM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
>
> This patch reverts commit 407d418f2fd4 ("powerpc/chrp: Move PHB
> discovery") and commit 9634afa67bfd ("powerpc/chrp: Make hydra_init()
> static").
>
> Running the upstream kernel on Qemu's brand new "pegasos2" emulation
> results in a variety of backtraces such as

...and actually using it appears to require both manually enabling it
in the qemu config and finding a random bios blob that is no longer
distributed by the manufacturer. Cool.

> Kernel attempted to write user page (a1) - exploit attempt? (uid: 0)
> ------------[ cut here ]------------
> Bug: Write fault blocked by KUAP!
> WARNING: CPU: 0 PID: 0 at arch/powerpc/mm/fault.c:230 do_page_fault+0x4f4/0x920
> CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.2 #40
> NIP: c0021824 LR: c0021824 CTR: 00000000
> REGS: c1085d50 TRAP: 0700 Not tainted (5.13.2)
> MSR: 00021032 <ME,IR,DR,RI> CR: 24042254 XER: 00000000
>
> GPR00: c0021824 c1085e10 c0f8c520 00000021 3fffefff c1085c60 c1085c58 00000000
> GPR08: 00001032 00000000 00000000 c0ffb3ec 44042254 00000000 00000000 00000004
> GPR16: 00000000 ffffffff 000000c4 000000d0 0188c6e0 01006000 00000001 40b14000
> GPR24: c0ec000c 00000300 02000000 00000000 42000000 000000a1 00000000 c1085e60
> NIP [c0021824] do_page_fault+0x4f4/0x920
> LR [c0021824] do_page_fault+0x4f4/0x920
> Call Trace:
> [c1085e10] [c0021824] do_page_fault+0x4f4/0x920 (unreliable)
> [c1085e50] [c0004254] DataAccess_virt+0xd4/0xe4
>
> and the system fails to boot. Bisect points to commit 407d418f2fd4
> ("powerpc/chrp: Move PHB discovery"). Reverting this patch together with
> commit 9634afa67bfd ("powerpc/chrp: Make hydra_init() static") fixes
> the problem.

The rationale for adding ppc_md.discover_phbs() and shifting all the
platforms over to using it is in commit 5537fcb319d0 ("powerpc/pci:
Add ppc_md.discover_phbs()"). I'd rather not go back to having random
platforms doing their PCI init before the kernel has setup the page
allocator. You need to either debug the problem fully, or provide
enough replication details so that someone who isn't invested in
emulating ancient hardware (i.e. me) with enough information to
actually replicate the problem.

Oliver