Re: [PATCH 6.9 000/281] 6.9.6-rc1 review

From: Naresh Kamboju
Date: Thu Jun 20 2024 - 11:40:09 EST


On Thu, 20 Jun 2024 at 19:50, David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 20.06.24 16:02, Naresh Kamboju wrote:
> > On Thu, 20 Jun 2024 at 19:23, David Hildenbrand <david@xxxxxxxxxx> wrote:
> >>
> >> On 20.06.24 15:14, Naresh Kamboju wrote:
> >>> On Thu, 20 Jun 2024 at 17:59, Greg Kroah-Hartman
> >>> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> >>>>
> >>>> On Thu, Jun 20, 2024 at 05:21:09PM +0530, Naresh Kamboju wrote:
> >>>>> On Wed, 19 Jun 2024 at 18:41, Greg Kroah-Hartman
> >>>>> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> >>>>>>
> >>>>>> This is the start of the stable review cycle for the 6.9.6 release.
> >>>>>> There are 281 patches in this series, all will be posted as a response
> >>>>>> to this one. If anyone has any issues with these being applied, please
> >>>>>> let me know.
> >>>>>>
> >>>>>> Responses should be made by Fri, 21 Jun 2024 12:55:11 +0000.
> >>>>>> Anything received after that time might be too late.
> >>>>>>
> >>>>>> The whole patch series can be found in one patch at:
> >>>>>> https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.9.6-rc1.gz
> >>>>>> or in the git tree and branch at:
> >>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.9.y
> >>>>>> and the diffstat can be found below.
> >>>>>>
> >>>>>> thanks,
> >>>>>>
> >>>>>> greg k-h
> >>>>>
> >>>>> There are two major issues on arm64 Juno-r2 on Linux stable-rc 6.9.6-rc1
> >>>>>
> >>>>> Reported-by: Linux Kernel Functional Testing <lkft@xxxxxxxxxx>
> >>>>>
> >>>>> 1)
> >>>>> The LTP controllers cgroup_fj_stress test cases causing kernel crash
> >>>>> on arm64 Juno-r2 with
> >>>>> compat mode testing with stable-rc 6.9 kernel.
> >>>>>
> >>>>> In the recent past I have reported this issues on Linux mainline.
> >>>>>
> >>>>> LTP: fork13: kernel panic on rk3399-rock-pi-4 running mainline 6.10.rc3
> >>>>> - https://lore.kernel.org/all/CA+G9fYvKmr84WzTArmfaypKM9+=Aw0uXCtuUKHQKFCNMGJyOgQ@xxxxxxxxxxxxxx/
> >>>>>
> >>>>> it goes like this,
> >>>>> Unable to handle kernel NULL pointer dereference at virtual address
> >>>>> ...
> >>>>> Insufficient stack space to handle exception!
> >>>>> end Kernel panic - not syncing: kernel stack overflow
> >>>>>
> >>
> >> How is that related to 6.9.6-rc1? That report is from mainline (6.10.rc3).
> >>
> >> Can you share a similar kernel dmesg output from the issue on 6.9.6-rc1?
> >
> > I request you to use this link for detailed boot log, test log and crash log.
> > - https://lkft.validation.linaro.org/scheduler/job/7687060#L23314
> >
> > Few more logs related to build artifacts links provided in the original
> > email thread and bottom of this email.
> >
> > crash log:
> > ---
> >

Thanks for investigating this crash report.

> Thanks, so this is something different than the
>
> "BUG: Bad page map in process fork13
> BUG: Bad rss-counter state mm:"
>
> stuff on mainline you referenced.
>
> Looks like some recursive exception until we exhausted the stack.

You are right !
I see only one common case is, exhaust the stack.

>
>
> Trying to connect the dots here, can you enlighten me how this is
> related to the fork13 mainline report?

I am not sure about the relation between these two reports.
But as a common practice I have shared that report information.

> > [ 0.000000] Booting Linux on physical CPU 0x0000000100 [0x410fd033]
> > [ 0.000000] Linux version 6.9.6-rc1 (tuxmake@tuxmake)
> > (aarch64-linux-gnu-gcc (Debian 13.2.0-12) 13.2.0, GNU ld (GNU Binutils
> > for Debian) 2.42) #1 SMP PREEMPT @1718817000
> > ...
> > [ 1786.336761] Unable to handle kernel NULL pointer dereference at
> > virtual address 0000000000000070
> > [ 1786.345564] Mem abort info:
> > [ 1786.348359] ESR = 0x0000000096000004
> > [ 1786.352112] EC = 0x25: DABT (current EL), IL = 32 bits
> > [ 1786.357434] SET = 0, FnV = 0
> > [ 1786.360492] EA = 0, S1PTW = 0
> > [ 1786.363637] FSC = 0x04: level 0 translation fault
> > [ 1786.368523] Data abort info:
> > [ 1786.371405] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> > [ 1786.376900] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > [ 1786.381960] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> > [ 1786.387284] Unable to handle kernel NULL pointer dereference at
> > virtual address 0000000000000070
> > [ 1786.387293] Insufficient stack space to handle exception!
> > [ 1786.387296] ESR: 0x0000000096000047 -- DABT (current EL)
> > [ 1786.387302] FAR: 0xffff80008399ffe0
> > [ 1786.387306] Task stack: [0xffff8000839a0000..0xffff8000839a4000]
> > [ 1786.387312] IRQ stack: [0xffff8000837f8000..0xffff8000837fc000]
> > [ 1786.387319] Overflow stack: [0xffff00097ec95320..0xffff00097ec96320]
> > [ 1786.387327] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 6.9.6-rc1 #1
> > [ 1786.387338] Hardware name: ARM Juno development board (r2) (DT)
> > [ 1786.387344] pstate: a00003c5 (NzCv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [ 1786.387355] pc : _prb_read_valid (kernel/printk/printk_ringbuffer.c:2109)
> > [ 1786.387374] lr : prb_read_valid (kernel/printk/printk_ringbuffer.c:2183)
> > [ 1786.387385] sp : ffff80008399ffe0
> > [ 1786.387390] x29: ffff8000839a0030 x28: ffff000800365f00 x27: ffff800082530008
> > [ 1786.387407] x26: ffff8000834e33b8 x25: ffff8000839a00b0 x24: 0000000000000001
> > [ 1786.387423] x23: ffff8000839a00a8 x22: ffff8000830e3e40 x21: 0000000000001e9e
> > [ 1786.387438] x20: 0000000000000000 x19: ffff8000839a01c8 x18: 0000000000000010
> > [ 1786.387453] x17: 72646461206c6175 x16: 7472697620746120 x15: 65636e6572656665
> > [ 1786.387468] x14: 726564207265746e x13: 3037303030303030 x12: 3030303030303030
> > [ 1786.387483] x11: 2073736572646461 x10: ffff800083151ea0 x9 : ffff80008014273c
> > [ 1786.387498] x8 : ffff8000839a0120 x7 : 0000000000000000 x6 : 0000000000000e9f
> > [ 1786.387512] x5 : ffff8000839a00c8 x4 : ffff8000837157c0 x3 : 0000000000000000
> > [ 1786.387526] x2 : ffff8000839a00b0 x1 : 0000000000000000 x0 : ffff8000830e3f58
> > [ 1786.387542] Kernel panic - not syncing: kernel stack overflow
> > [ 1786.387549] SMP: stopping secondary CPUs
> > [ 1787.510055] SMP: failed to stop secondary CPUs 0,4
> > [ 1787.510065] Kernel Offset: disabled
> > [ 1787.510068] CPU features: 0x4,00001061,e0100000,0200421b
> > [ 1787.510076] Memory Limit: none
> > [ 1787.680436] ---[ end Kernel panic - not syncing: kernel stack overflow ]---
>
>
> --
> Cheers,
>
> David / dhildenb

- Naresh