Re: rmk/for-next bisection: baseline.login on bcm2836-rpi-2-b

From: Ard Biesheuvel
Date: Mon Nov 16 2020 - 07:26:43 EST


On Mon, 16 Nov 2020 at 12:20, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
>
> On Sun, 15 Nov 2020 at 15:11, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
> >
> > On Fri, 13 Nov 2020 at 17:25, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
> > >
> > > On Fri, 13 Nov 2020 at 17:15, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
> > > >
> > > > On Fri, 13 Nov 2020 at 16:58, Russell King - ARM Linux admin
> > > > <linux@xxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > On Fri, Nov 13, 2020 at 03:43:27PM +0000, Guillaume Tucker wrote:
> > > > > > On 13/11/2020 10:35, Ard Biesheuvel wrote:
> > > > > > > On Fri, 13 Nov 2020 at 11:31, Guillaume Tucker
> > > > > > > <guillaume.tucker@xxxxxxxxxxxxx> wrote:
> > > > > > >>
> > > > > > >> Hi Ard,
> > > > > > >>
> > > > > > >> Please see the bisection report below about a boot failure on
> > > > > > >> RPi-2b.
> > > > > > >>
> > > > > > >> Reports aren't automatically sent to the public while we're
> > > > > > >> trialing new bisection features on kernelci.org but this one
> > > > > > >> looks valid.
> > > > > > >>
> > > > > > >> There's nothing in the serial console log, probably because it's
> > > > > > >> crashing too early during boot. I'm not sure if other platforms
> > > > > > >> on kernelci.org were hit by this in the same way, but there
> > > > > > >> doesn't seem to be any.
> > > > > > >>
> > > > > > >> The same regression can be see on rmk's for-next branch as well
> > > > > > >> as in linux-next. It happens with both bcm2835_defconfig and
> > > > > > >> multi_v7_defconfig.
> > > > > > >>
> > > > > > >> Some more details can be found here:
> > > > > > >>
> > > > > > >> https://kernelci.org/test/case/id/5fae44823818ee918adb8864/
> > > > > > >>
> > > > > > >> If this looks like a real issue but you don't have a platform at
> > > > > > >> hand to reproduce it, please let us know if you would like the
> > > > > > >> KernelCI test to be re-run with earlyprintk or some debug config
> > > > > > >> turned on, or if you have a fix to try.
> > > > > > >>
> > > > > > >> Best wishes,
> > > > > > >> Guillaume
> > > > > > >>
> > > > > > >
> > > > > > > Hello Guillaume,
> > > > > > >
> > > > > > > That patch did have an issue, but it was already fixed by
> > > > > > >
> > > > > > > https://www.armlinux.org.uk/developer/patches/viewpatch.php?id=9020/1
> > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=fc2933c133744305236793025b00c2f7d258b687
> > > > > > >
> > > > > > > Could you please double check whether cherry-picking that on top of
> > > > > > > the first bad commit fixes the problem?
> > > > > >
> > > > > > Sadly this doesn't appear to be fixing the issue. I've
> > > > > > cherry-picked your patch on top of the commit found by the
> > > > > > bisection but it still didn't boot, here's the git log
> > > > > >
> > > > > > cbb9656e83ca ARM: 9020/1: mm: use correct section size macro to describe the FDT virtual address
> > > > > > 7a1be318f579 ARM: 9012/1: move device tree mapping out of linear region
> > > > > > e9a2f8b599d0 ARM: 9011/1: centralize phys-to-virt conversion of DT/ATAGS address
> > > > > > 3650b228f83a Linux 5.10-rc1
> > > > > >
> > > > > > Test log: https://people.collabora.com/~gtucker/lava/boot/rpi-2-b/v5.10-rc1-3-gcbb9656e83ca/
> > > > > >
> > > > > > There's no output so it's hard to tell what is going on, but
> > > > > > reverting the bad commmit does make the board to boot (that's
> > > > > > what "revert: PASS" means in the bisect report). So it's
> > > > > > unlikely that there is another issue causing the boot failure.
> > > > >
> > > > > These silent boot failures are precisely what the DEBUG_LL stuff (and
> > > > > early_printk) is supposed to help with - getting the kernel messages
> > > > > out when there is an oops before the serial console is initialised.
> > > > >
> > > >
> > > > If this is indeed related to the FDT mapping, I would assume
> > > > earlycon=... to be usable here.
> > > >
> > > > I will try to reproduce this on a RPi3 but I don't have a RPi2 at
> > > > hand, unfortunately.
> > > >
> > > > Would you mind having a quick try whether you can reproduce this on
> > > > QEMU, using the raspi2 machine model? If so, that would be a *lot*
> > > > easier to diagnose.
> > >
> > > Also, please have a go with 'earlycon=pl011,0x3f201000' added to the
> > > kernel command line.
> >
> > I cannot reproduce this - I don't have the exact same hardware, but
> > for booting the kernel, I think RPi2 and RPi3 should be sufficiently
> > similar, and I can boot on Rpi3 using a u-boot built for rpi2 using
> > your provided dtb for RPi2.
> >
> > What puzzles me is that u-boot reports itself as
> >
> > U-Boot 2016.03-rc1-00131-g39af3d8-dirty
> >
> > RPI Model B+ (0x10)
> >
> > which is the ARMv6 model not the ARMv7, but then the kernel reports
> >
> > CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c53c7d
> >
>
> Another thing I noticed is that the bootloader on these boards loads
> the FDT at address 0x100, which is described by the FDT itself as
> reserved memory, and which typically holds the spin tables used for
> SMP boot.
>
> Could you try loading the DT elsewhere, and see if that changes anything?

I think I narrowed this down to the early DT mapping code, which
considers any DT address that falls inside the first section as 'no
DT', and then relies on the first section mapping of the decompressed
kernel to cover it instead.

Could you please try the following change?


diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
index 28687fd1240a..7f62c5eccdf3 100644
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -265,10 +265,10 @@ __create_page_tables:
* We map 2 sections in case the ATAGs/DTB crosses a section boundary.
*/
mov r0, r2, lsr #SECTION_SHIFT
- movs r0, r0, lsl #SECTION_SHIFT
+ cmp r2, #0
ldrne r3, =FDT_FIXED_BASE >> (SECTION_SHIFT - PMD_ORDER)
addne r3, r3, r4
- orrne r6, r7, r0
+ orrne r6, r7, r0, lsl #SECTION_SHIFT
strne r6, [r3], #1 << PMD_ORDER
addne r6, r6, #1 << SECTION_SHIFT
strne r6, [r3]