Qemu-arm64: LTP: cfs_bandwidth01: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000038

From: Naresh Kamboju
Date: Tue Sep 12 2023 - 03:56:21 EST


Following kernel crash noticed on Linux stable-rc 6.5.3-rc1 on qemu-arm64 while
running LTP sched tests cases.

This is not always reproducible.

Anyone have noticed LTP cfs_bandwidth01 causing a kernel crash on any of the
devices or qemu-* ?

I need to check similar crashes on other Linux trees and branches.

Boot log and test log:
---------------------
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x000f0510]
[ 0.000000] Linux version 6.5.3-rc1 (tuxmake@tuxmake) (Debian clang
version 18.0.0 (++20230910112057+710b5a12324e-1~exp1~20230910112229.889),
Debian LLD 18.0.0) #1 SMP PREEMPT @1694441978
[ 0.000000] KASLR enabled
[ 0.000000] random: crng init done
[ 0.000000] Machine model: linux,dummy-virt
...
running LTP sched tests
...
cfs_bandwidth01.c:129: TPASS: Workers exited
cfs_bandwidth01.c:117: TPASS: Scheduled bandwidth constrained workers
cfs_bandwidth01.c:54: TINFO: Set 'level2/cpu.max' = '5000 10000'
<1>[ 74.455327] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000038
<1>[ 74.456395] Mem abort info:
<1>[ 74.456639] ESR = 0x0000000097880004
<1>[ 74.458273] EC = 0x25: DABT (current EL), IL = 32 bits
<1>[ 74.458859] SET = 0, FnV = 0
<1>[ 74.459495] EA = 0, S1PTW = 0
<1>[ 74.460171] FSC = 0x04: level 0 translation fault
<1>[ 74.460799] Data abort info:
<1>[ 74.461388] Access size = 4 byte(s)
<1>[ 74.462068] SSE = 0, SRT = 8
<1>[ 74.462713] SF = 0, AR = 0
<1>[ 74.463257] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
<1>[ 74.463996] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
<1>[ 74.465120] user pgtable: 4k pages, 48-bit VAs, pgdp=00000001029d6000
<1>[ 74.465818] [0000000000000038] pgd=0000000000000000, p4d=0000000000000000
<0>[ 74.468416] Internal error: Oops: 0000000097880004 [#1] PREEMPT SMP
<4>[ 74.469489] Modules linked in: fuse drm dm_mod ip_tables x_tables
<4>[ 74.470964] CPU: 0 PID: 435 Comm: cfs_bandwidth01 Not tainted 6.5.3-rc1 #1
<4>[ 74.471789] Hardware name: linux,dummy-virt (DT)
<4>[ 74.473045] pstate: 634000c9 (nZCv daIF +PAN -UAO +TCO +DIT
-SSBS BTYPE=--)
<4>[ 74.473785] pc : set_next_entity+0xc0/0x1f8
<4>[ 74.475461] lr : pick_next_task_fair+0x204/0x3b8
<4>[ 74.476989] sp : ffff8000807eb870
<4>[ 74.477346] x29: ffff8000807eb870 x28: ffff0000c4e3b750 x27:
ffffcb93e8e19008
<4>[ 74.478392] x26: ffff0000c4e3b0c0 x25: ffffcb93e8ab4828 x24:
ffff0000c0354a00
<4>[ 74.479263] x23: ffff8000807eb900 x22: 0000000000000000 x21:
ffff0000ff5b1300
<4>[ 74.480401] x20: ffff0000ff5b1300 x19: 0000000000000000 x18:
0000000000000000
<4>[ 74.481417] x17: 000000000000ba7e x16: 0000000000000606 x15:
000000000117d17a
<4>[ 74.482733] x14: 0000000000000000 x13: 0000000f0f4bc800 x12:
00000000000002b0
<4>[ 74.484181] x11: 0000000f0f4bc800 x10: 0000000cf6ad6bd1 x9 :
ffffcb93e6af8e4c
<4>[ 74.485229] x8 : 0000000000000000 x7 : ffffcb93e8a3ccac x6 :
0000000000000003
<4>[ 74.486131] x5 : 000000008040002b x4 : 0000ffffbef0c000 x3 :
ffff0000ff5b1200
<4>[ 74.487012] x2 : ffff0000c39efc00 x1 : 0000000000000000 x0 :
ffff0000ff5b1300
<4>[ 74.488236] Call trace:
<4>[ 74.488608] set_next_entity+0xc0/0x1f8
<4>[ 74.489280] pick_next_task_fair+0x204/0x3b8
<4>[ 74.489987] __schedule+0x1e0/0x9c8
<4>[ 74.490903] schedule+0x134/0x1b8
<4>[ 74.491632] schedule_preempt_disabled+0x90/0x108
<4>[ 74.492392] rwsem_down_write_slowpath+0x288/0x6f0
<4>[ 74.493056] down_write+0x48/0xb0
<4>[ 74.493606] unlink_anon_vmas+0x148/0x1b0
<4>[ 74.494222] free_pgtables+0x10c/0x200
<4>[ 74.494800] exit_mmap+0x174/0x3c0
<4>[ 74.495177] __mmput+0x48/0x150
<4>[ 74.495761] mmput+0x34/0x70
<4>[ 74.496058] exit_mm+0xbc/0x148
<4>[ 74.497651] do_exit+0x22c/0x910
<4>[ 74.498212] do_group_exit+0xa4/0xb0
<4>[ 74.498870] __arm64_sys_exit_group+0x24/0x30
<4>[ 74.499484] invoke_syscall+0x4c/0x120
<4>[ 74.499834] el0_svc_common+0xd0/0x110
<4>[ 74.500196] do_el0_svc+0x3c/0xb8
<4>[ 74.500475] el0_svc+0x30/0x90
<4>[ 74.500746] el0t_64_sync_handler+0x84/0x100
<4>[ 74.501309] el0t_64_sync+0x190/0x198
<0>[ 74.502156] Code: f900293f f9403908 b5ffff48 17ffffde (b9403a68)
<4>[ 74.503735] ---[ end trace 0000000000000000 ]---
<6>[ 74.504727] note: cfs_bandwidth01[435] exited with irqs disabled

Links:
-----
- https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/2VFpDOMEgzroNyiP9SSlxRxHsMH
- https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.5.y/build/v6.5.2-740-g7bfd1316ceae/testrun/19901770/suite/log-parser-test/tests/
- https://storage.tuxsuite.com/public/linaro/lkft/builds/2VFpB1ieNZSp5zh0joVGtoMn7RG/

Steps to reproduce:
----------------
# To install tuxrun to your home directory at ~/.local/bin:
# pip3 install -U --user tuxrun==0.49.2
#
# Or install a deb/rpm depending on the running distribution
# See https://tuxmake.org/install-deb/ or
# https://tuxmake.org/install-rpm/
#
# See https://tuxrun.org/ for complete documentation.
#

tuxrun --runtime podman --device qemu-arm64 --boot-args rw --kernel
https://storage.tuxsuite.com/public/linaro/lkft/builds/2VFpB1ieNZSp5zh0joVGtoMn7RG/Image.gz
--modules https://storage.tuxsuite.com/public/linaro/lkft/builds/2VFpB1ieNZSp5zh0joVGtoMn7RG/modules.tar.xz
--rootfs https://storage.tuxboot.com/debian/bookworm/arm64/rootfs.ext4.xz
--parameters SKIPFILE=skipfile-lkft.yaml --parameters SHARD_NUMBER=4
--parameters SHARD_INDEX=2 --image
docker.io/linaro/tuxrun-dispatcher:v0.49.2 --tests ltp-sched
--timeouts boot=30 ltp-sched=30 --overlay
https://storage.tuxboot.com/overlays/debian/bookworm/arm64/ltp/20230516/ltp.tar.xz


--
Linaro LKFT
https://lkft.linaro.org