Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
From: Manu Bretelle
Date: Tue Jul 09 2024 - 15:07:10 EST
________________________________________
From: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
Sent: Tuesday, July 9, 2024 10:44 AM
To: KP Singh <kpsingh@xxxxxxxxxx>
Cc: Puranjay Mohan <puranjay@xxxxxxxxxx>; Andrii Nakryiko <andrii@xxxxxxxxxx>; Eduard Zingerman <eddyz87@xxxxxxxxx>; Mykola Lysenko <mykolal@xxxxxxxx>; Alexei Starovoitov <ast@xxxxxxxxxx>; Martin KaFai Lau <martin.lau@xxxxxxxxx>; Song Liu <song@xxxxxxxxxx>; Yonghong Song <yonghong.song@xxxxxxxxx>; John Fastabend <john.fastabend@xxxxxxxxx>; Stanislav Fomichev <sdf@xxxxxxxxxx>; Hao Luo <haoluo@xxxxxxxxxx>; Jiri Olsa <jolsa@xxxxxxxxxx>; Shuah Khan <shuah@xxxxxxxxxx>; bpf@xxxxxxxxxxxxxxx <bpf@xxxxxxxxxxxxxxx>; linux-kselftest@xxxxxxxxxxxxxxx <linux-kselftest@xxxxxxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx <linux-kernel@xxxxxxxxxxxxxxx>; Manu Bretelle <chantra@xxxxxxxx>; Florent Revest <revest@xxxxxxxxxx>
Subject: Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
On 7/8/24 6:42 PM, KP Singh wrote:
> On Mon, Jul 8, 2024 at 6:09 PM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:
>> On 7/8/24 5:35 PM, Puranjay Mohan wrote:
>>> Daniel Borkmann <daniel@xxxxxxxxxxxxx> writes:
>>>> On 7/8/24 5:26 PM, KP Singh wrote:
>>>>> On Mon, Jul 8, 2024 at 5:00 PM Puranjay Mohan <puranjay@xxxxxxxxxx> wrote:
>>>>>> Daniel Borkmann <daniel@xxxxxxxxxxxxx> writes:
>>>>>>> On 7/5/24 4:50 PM, Puranjay Mohan wrote:
>>>>>>>> fexit_sleep test runs successfully now on the CI so remove it from the
>>>>>>>> deny list.
>>>>>>>
>>>>>>> Do you happen to know which commit fixed it? If yes, might be nice to have it
>>>>>>> documented in the commit message.
>>>>>>
>>>>>> Actually, I never saw this test failing on my local setup and yesterday
>>>>>> I tried running it on the CI where it passed as well. So, I assumed that
>>>>>> this would be fixed by some commit. I am not sure which exact commit
>>>>>> might have fixed this.
>>>>>>
>>>>>> Manu, Martin
>>>>>>
>>>>>> When this was added to the deny list was this failing every time and did
>>>>>> you have some reproducer for this. If there is a reproducer, I can try
>>>>>> fixing it but when ran normally this test never fails for me.
>>>>>
>>>>> I think this never worked until
>>>>> https://lore.kernel.org/lkml/20230405180250.2046566-1-revest@xxxxxxxxxxxx/
>>>>> was merged, FTrace direct calls was blocking tracing programs on ARM,
>>>>> since then it has always worked.
>>>>
>>>> Awesome, thanks! I'll add this to the commit desc then when applying.
>>>
>>> The commit that added this to the deny list said:
>>> 31f4f810d533 ("selftests/bpf: Add fexit_sleep to DENYLIST.aarch64")
>>>
>>> ```
>>> It is reported that the fexit_sleep never returns in aarch64.
>>> The remaining tests cannot start.
>>> ```
>
> It may also have something to do with sleepable programs. But I think
> it's generally in the category of "BPF tracing was catching up with
> ARM", it has now.
Hm, the latest run actually hangs in fexit_sleep (which is the test right after
fexit_bpf2bpf). So looks like this was too early. It seems some CI runs pass on
arm64 but others fail:
https://github.com/kernel-patches/bpf/actions/runs/9859826851/job/27224868398 ;(fail)
https://github.com/kernel-patches/bpf/actions/runs/9859837213/job/27224955045 ;(pass)
Puranjay, do you have a chance to look into this again?
Probably unrelated... but when I tried to reproduce this using qemu in full emulation mode [0], I am getting a kernel crash for fexit_sleep, but also for fexit_bpf2bpf, fentry_fexit
stacktraces look like (for fentry_fexit)
root@(none):/mnt/vmtest/selftests/bpf# ./test_progs -v -t fentry_fexit
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_fentry_fexit:PASS:fentry_skel_load 0 nsec
test_fentry_fexit:PASS:fexit_skel_load 0 nsec
test_fentry_fexit:PASS:fentry_attach 0 nsec
test_fentry_fexit:PASS:fexit_attach 0 nsec
Unable to handle kernel paging request at virtual address ffff0000c2a80e68
Mem abort info:
ESR = 0x0000000096000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
Data abort info:
ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
swapper pgtable: 4k pages, 52-bit VAs, pgdp=0000000041b4a000
[ffff0000c2a80e68] pgd=1000000042f28003, p4d=0000000000000000
Internal error: Oops: 0000000096000004 [#1] SMP
Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod(OE)]
CPU: 0 PID: 97 Comm: test_progs Tainted: G OE 6.10.0-rc6-gb0eedd920017-dirty #67
Hardware name: linux,dummy-virt (DT)
pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : __bpf_tramp_enter+0x58/0x190
lr : __bpf_tramp_enter+0xd8/0x190
sp : ffff800084afbc10
x29: ffff800084afbc10 x28: fff00000c28c2e80 x27: 0000000000000000
x26: 0000000000000000 x25: 0000000000000050 x24: 0000000000000000
x23: 000000000000000a x22: fff00000c28c2e80 x21: 0000ffffed100070
x20: ffff800082032938 x19: ffff0000c2a80c00 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffed100070
x14: 0000000000000000 x13: ffff800082032938 x12: 0000000000000000
x11: 0000000000020007 x10: 0000000000000007 x9 : 00000000ffffffff
x8 : 0000000000004008 x7 : ffff80008218fa78 x6 : 0000000000000000
x5 : 0000000000000001 x4 : 0000000086db7919 x3 : 0000000095481a34
x2 : 0000000000000001 x1 : fff00000c28c2e80 x0 : 0000000000000001
Call trace:
__bpf_tramp_enter+0x58/0x190
bpf_trampoline_6442499844+0x44/0x158
bpf_fentry_test1+0x8/0x10
bpf_prog_test_run_tracing+0x190/0x328
__sys_bpf+0x844/0x2148
__arm64_sys_bpf+0x2c/0x48
invoke_syscall+0x4c/0x118
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x4c/0x120
el0t_64_sync_handler+0xc0/0xc8
el0t_64_sync+0x190/0x198
Code: 52800001 97f9f3df 942a3be8 35000400 (f9413660)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Oops: Fatal exception
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x00,00000006,8c13bd78,576676af
Memory Limit: none
For "fexit_sleep" and "fexit_bpf2bpf" respectively:
$ ( cd 9859826851 && vmtest -k kbuild-output/arch/arm64/boot/Image.gz -r ../aarch64-rootfs -a aarch64 '/bin/mount bpffs /sys/fs/bpf -t bpf && ip link set lo up && cd /mnt/vmtest/selftests/bpf/ && ./test_progs -v -t fexit_sleep' )
=> Image.gz
===> Booting
===> Setting up VM
===> Running command
root@(none):/# bpf_testmod: loading out-of-tree module taints kernel.
bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
Unable to handle kernel paging request at virtual address ffff0000c19c2668
Mem abort info:
ESR = 0x0000000096000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
Data abort info:
ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
swapper pgtable: 4k pages, 52-bit VAs, pgdp=0000000041b4a000
[ffff0000c19c2668] pgd=1000000042f28003, p4d=0000000000000000
Internal error: Oops: 0000000096000004 [#1] SMP
Modules linked in: bpf_testmod(OE)
CPU: 1 PID: 91 Comm: test_progs Tainted: G OE 6.10.0-rc6-gb0eedd920017-dirty #67
Hardware name: linux,dummy-virt (DT)
pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : __bpf_tramp_enter+0x58/0x190
lr : __bpf_tramp_enter+0xd8/0x190
sp : ffff800084c4bda0
x29: ffff800084c4bda0 x28: fff00000c274ae80 x27: 0000000000000000
x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
x23: 0000000060001000 x22: 0000ffffa36b7a54 x21: 00000000ffffffff
x20: ffff800082032938 x19: ffff0000c19c2400 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
x14: 0000000000000000 x13: ffff800082032938 x12: 0000000000000000
x11: 0000000000020007 x10: 0000000000000007 x9 : 00000000ffffffff
x8 : 0000000000004008 x7 : ffff80008218fa78 x6 : 0000000000000000
x5 : 0000000000000001 x4 : 0000000086db7919 x3 : 0000000095481a34
x2 : 0000000000000001 x1 : fff00000c274ae80 x0 : 0000000000000001
Call trace:
__bpf_tramp_enter+0x58/0x190
bpf_trampoline_6442487232+0x44/0x158
__arm64_sys_nanosleep+0x8/0xf0
invoke_syscall+0x4c/0x118
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x4c/0x120
el0t_64_sync_handler+0xc0/0xc8
el0t_64_sync+0x190/0x198
Code: 52800001 97f9f3df 942a3be8 35000400 (f9413660)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Oops: Fatal exception
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x00,00000006,8c13bd78,576676af
Memory Limit: none
Failed to run command
Caused by:
0: Failed to QGA guest-exec-status
1: error running guest_exec_status
2: Broken pipe (os error 32)
3: Broken pipe (os error 32)
[11:46:14] chantra@devvm17937:scratchpad $
[11:47:56] chantra@devvm17937:scratchpad $
[11:47:57] chantra@devvm17937:scratchpad $ ( cd 9859826851 && vmtest -k kbuild-output/arch/arm64/boot/Image.gz -r ../aarch64-rootfs -a aarch64 '/bin/mount bpffs /sys/fs/bpf -t bpf && ip link set lo up && cd /mnt/vmtest/selftests/bpf/ && ./test_progs -v -t fexit_bpf2bpf' )
=> Image.gz
===> Booting
===> Setting up VM
===> Running command
root@(none):/# bpf_testmod: loading out-of-tree module taints kernel.
bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
Unable to handle kernel paging request at virtual address ffff0000c278de68
Mem abort info:
ESR = 0x0000000096000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
Data abort info:
ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
swapper pgtable: 4k pages, 52-bit VAs, pgdp=0000000041b4a000
[ffff0000c278de68] pgd=1000000042f28003, p4d=0000000000000000
Internal error: Oops: 0000000096000004 [#1] SMP
Modules linked in: bpf_testmod(OE)
CPU: 1 PID: 87 Comm: test_progs Tainted: G OE 6.10.0-rc6-gb0eedd920017-dirty #67
Hardware name: linux,dummy-virt (DT)
pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : __bpf_tramp_enter+0x58/0x190
lr : __bpf_tramp_enter+0xd8/0x190
sp : ffff800084c4ba90
x29: ffff800084c4ba90 x28: ffff800080a32d10 x27: ffff800080a32d80
x26: ffff8000813e0ad8 x25: ffff800084c4bce4 x24: ffff800082fbd048
x23: 0000000000000001 x22: fff00000c2732e80 x21: fff00000c18a3200
x20: ffff800082032938 x19: ffff0000c278dc00 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaaabcc22aa0
x14: 0000000000000000 x13: ffff800082032938 x12: 0000000000000000
x11: 0000000000000000 x10: 000000000ac0d5af x9 : 000000000ac0d5af
x8 : 00000000a4d7a457 x7 : ffff80008218fa78 x6 : 0000000000000000
x5 : 0000000000000002 x4 : 0000000006fa0785 x3 : 0000000081d7cd4c
x2 : 0000000000000202 x1 : fff00000c2732e80 x0 : 0000000000000001
Call trace:
__bpf_tramp_enter+0x58/0x190
bpf_trampoline_34359738386+0x44/0xf8
bpf_prog_3b052b77318ab7c4_test_pkt_md_access+0x8/0x118
bpf_test_run+0x200/0x3a0
bpf_prog_test_run_skb+0x328/0x6d8
__sys_bpf+0x844/0x2148
__arm64_sys_bpf+0x2c/0x48
invoke_syscall+0x4c/0x118
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x4c/0x120
el0t_64_sync_handler+0xc0/0xc8
el0t_64_sync+0x190/0x198
Code: 52800001 97f9f3df 942a3be8 35000400 (f9413660)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Oops: Fatal exception in interrupt
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x00,00000006,8c13bd78,576676af
Memory Limit: none
Failed to run command
Caused by:
0: Failed to QGA guest-exec-status
1: error running guest_exec_status
2: Broken pipe (os error 32)
3: Broken pipe (os error 32)
[0] https://chantra.github.io/bpfcitools/bpfci-troubleshooting.html
>>> So, if the lack of Ftrace direct calls would be the reason then the
>>> failure would be due to fexit programs not being supported on arm64.
>>>
>>> But this says that the selftest never returns therefore is not related
>>> to ftrace direct call support but another bug?
>>
>> Fwiw, at least it is passing in the BPF CI now.
>>
>> https://github.com/kernel-patches/bpf/actions/runs/9841781347/job/27169610006