Re: next/master boot bisection: next-20190215 on beaglebone-black

From: Dan Williams
Date: Thu Mar 07 2019 - 10:43:28 EST


On Thu, Mar 7, 2019 at 1:17 AM Guillaume Tucker
<guillaume.tucker@xxxxxxxxxxxxx> wrote:
>
> On 06/03/2019 14:05, Mike Rapoport wrote:
> > On Wed, Mar 06, 2019 at 10:14:47AM +0000, Guillaume Tucker wrote:
> >> On 01/03/2019 23:23, Dan Williams wrote:
> >>> On Fri, Mar 1, 2019 at 1:05 PM Guillaume Tucker
> >>> <guillaume.tucker@xxxxxxxxxxxxx> wrote:
> >>>
> >>> Is there an early-printk facility that can be turned on to see how far
> >>> we get in the boot?
> >>
> >> Yes, I've done that now by enabling CONFIG_DEBUG_AM33XXUART1 and
> >> earlyprintk in the command line. Here's the result, with the
> >> commit cherry picked on top of next-20190304:
> >>
> >> https://lava.collabora.co.uk/scheduler/job/1526326
> >>
> >> [ 1.379522] ti-sysc 4804a000.target-module: sysc_flags 00000222 != 00000022
> >> [ 1.396718] Unable to handle kernel paging request at virtual address 77bb4003
> >> [ 1.404203] pgd = (ptrval)
> >> [ 1.406971] [77bb4003] *pgd=00000000
> >> [ 1.410650] Internal error: Oops: 5 [#1] ARM
> >> [...]
> >> [ 1.672310] [<c07051a0>] (clk_hw_create_clk.part.21) from [<c06fea34>] (devm_clk_get+0x4c/0x80)
> >> [ 1.681232] [<c06fea34>] (devm_clk_get) from [<c064253c>] (sysc_probe+0x28c/0xde4)
> >>
> >> It's always failing at that point in the code. Also when
> >> enabling "debug" on the kernel command line, the issue goes
> >> away (exact same binaries etc..):
> >>
> >> https://lava.collabora.co.uk/scheduler/job/1526327
> >>
> >> For the record, here's the branch I've been using:
> >>
> >> https://gitlab.collabora.com/gtucker/linux/tree/beaglebone-black-next-20190304-debug
> >>
> >> The board otherwise boots fine with next-20190304 (SMP=n), and
> >> also with the patch applied but the shuffle configs set to n.
> >>
> >>> Were there any boot *successes* on ARM with shuffling enabled? I.e.
> >>> clues about what's different about the specific memory setup for
> >>> beagle-bone-black.
> >>
> >> Looking at the KernelCI results from next-20190215, it looks like
> >> only the BeagleBone Black with SMP=n failed to boot:
> >>
> >> https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20190215/
> >>
> >> Of course that's not all the ARM boards that exist out there, but
> >> it's a fairly large coverage already.
> >>
> >> As the kernel panic always seems to originate in ti-sysc.c,
> >> there's a chance it's only visible on that platform... I'm doing
> >> a KernelCI run now with my test branch to double check that,
> >> it'll take a few hours so I'll send an update later if I get
> >> anything useful out of it.
>
> Here's the result, there were a couple of failures but some were
> due to infrastructure errors (nyan-big) and I'm not sure about
> what was the problem with the meson boards:
>
> https://staging.kernelci.org/boot/all/job/gtucker/branch/kernelci-local/kernel/next-20190304-1-g4f0b547b03da/
>
> So there's no clear indicator that the shuffle config is causing
> any issue on any other platform than the BeagleBone Black.
>
> >> In the meantime, I'm happy to try out other things with more
> >> debug configs turned on or any potential fixes someone might
> >> have.
> >
> > ARM is the only arch that sets ARCH_HAS_HOLES_MEMORYMODEL to 'y'. Maybe the
> > failure has something to do with it...
> >
> > Guillaume, can you try this patch:

Mike, I appreciate the help!

>
> Sure, it doesn't seem to be fixing the problem though:
>
> https://lava.collabora.co.uk/scheduler/job/1527471
>
> I've added the patch to the same branch based on next-20190304.
>
> I guess this needs to be debugged a little further to see what
> the panic really is about. I'll see if I can spend a bit more
> time on it this week, unless there's any BeagleBone expert
> available to help or if someone has another fix to try out.

Thanks for the help Guillaume!

I went ahead and acquired one of these boards to see if I can can
debug this locally.