[REGRESSION][BISECTED] Re: 6.12.7 stable new error: event xe_bo_move has unsafe dereference of argument 4
From: Genes Lists
Date: Mon Dec 30 2024 - 13:55:18 EST
On Fri, 2024-12-27 at 11:15 -0500, Genes Lists wrote:
> I have not had a chance to bisect this yet but since its in stable
> thought it best to share sooner than later.
>
> If noone spots anything, I will do a bisect soon as I can. Boot
> completes and aside from this error things seem fine.
>
> - Does not happen with 6.12.6.
> - Does happen with Mainline + patch for different drm bug
> mainline commit : d6ef8b40d075c425f548002d2f35ae3f06e9cf96
> Patch: https://patchwork.freedesktop.org/series/141911/
> Patch fixes drm bug reported here (patch not yet in mainline):
>
> https://lore.kernel.org/lkml/3b097dddd7095bccabe6791b90899c689f271a35
> .camel@xxxxxxxxxxxx/
>
> Graphics h/w:
> 00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-P
> [Iris Xe Graphics] (rev 04)
>
> Trace below and full dmesg attached.
>
> gene
>
> [ +0.143567] ------------[ cut here ]------------
> [ +0.000003] WARNING: CPU: 6 PID: 344 at
> kernel/trace/trace_events.c:577 trace_event_raw_init+0x159/0x660
> [ +0.000005] Modules linked in: xe(+) drm_ttm_helper gpu_sched
> drm_suballoc_helper drm_gpuvm drm_exec typec_displayport usbhid
> hid_sensor_custom hid_sensor_hub intel_ishtp_hid nvme
> spi_pxa2xx_platform hid_multitouch dw_dmac nvme_core 8250_dw
> spi_pxa2xx_core i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit
> crc32c_intel psmouse polyval_clmulni serio_raw polyval_generic
> drm_buddy atkbd ghash_clmulni_intel libps2 vivaldi_fmap ttm
> sha512_ssse3 sha256_ssse3 intel_gtt sha1_ssse3 aesni_intel gf128mul
> drm_display_helper ucsi_acpi crypto_simd typec_ucsi intel_lpss_pci
> intel_ish_ipc cryptd intel_lpss roles intel_ishtp idma64 cec vmd
> typec
> i2c_hid_acpi i8042 video i2c_hid serio wmi pkcs8_key_parser wireguard
> curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64
> ip6_udp_tunnel udp_tunnel libcurve25519_generic libchacha loop sg
> crypto_user
> [ +0.000022] CPU: 6 UID: 0 PID: 344 Comm: (udev-worker) Not tainted
> 6.12.7-stable-1 #16 8ef5eca39d52f9e2c5a50ace761c4a6858369a18
> [ +0.000003] Hardware name: Dell Inc. XPS 9320/0CR6NC, BIOS 2.16.1
> 09/11/2024
> [ +0.000001] RIP: 0010:trace_event_raw_init+0x159/0x660
> [ +0.000002] Code: 89 ea 0f 83 3b 04 00 00 e8 84 db ff ff 84 c0 74
> 10
> 8b 0c 24 48 c7 c0 fe ff ff ff 48 d3 c0 49 21 c6 4d 85 f6 0f 84 d6 fe
> ff
> ff <0f> 0b bb 01 00 00 00 41 f6 c6 01 0f 85 f3 78 bf 00 66 0f 1f 44
> 00
> [ +0.000001] RSP: 0018:ffffa3c8007bb980 EFLAGS: 00010206
> [ +0.000001] RAX: ffffffffffffffdf RBX: ffffffffc0ef8731 RCX:
> 0000000000000005
> [ +0.000001] RDX: 0000000000000002 RSI: 0000000000000001 RDI:
> ffffffffc0ef8727
> [ +0.000001] RBP: ffffffffc0ef8640 R08: 0000000000000039 R09:
> 0000000000000000
> [ +0.000000] R10: 0000000000000076 R11: 000000000000004e R12:
> 00000000000000f2
> [ +0.000001] R13: ffffffffc0ef9760 R14: 0000000000000018 R15:
> 0000000000000000
> [ +0.000001] FS: 000075f6c10d0880(0000) GS:ffff96b92f500000(0000)
> knlGS:0000000000000000
> [ +0.000001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ +0.000000] CR2: 00007be641f52f70 CR3: 000000010cffe000 CR4:
> 0000000000f50ef0
> [ +0.000001] PKRU: 55555554
> [ +0.000001] Call Trace:
> [ +0.000001] <TASK>
> [ +0.000000] ? trace_event_raw_init+0x159/0x660
> [ +0.000002] ? __warn.cold+0x93/0xf6
> [ +0.000001] ? trace_event_raw_init+0x159/0x660
> [ +0.000003] ? report_bug+0xff/0x140
> [ +0.000002] ? handle_bug+0x58/0x90
> [ +0.000001] ? exc_invalid_op+0x17/0x70
> [ +0.000002] ? asm_exc_invalid_op+0x1a/0x20
> [ +0.000003] ? trace_event_raw_init+0x159/0x660
> [ +0.000002] event_init+0x28/0x70
> [ +0.000002] trace_module_notify+0x19a/0x250
> [ +0.000001] notifier_call_chain+0x5a/0xd0
> [ +0.000002] blocking_notifier_call_chain_robust+0x65/0xc0
> [ +0.000002] load_module+0x1818/0x1ce0
> [ +0.000004] ? init_module_from_file+0x89/0xe0
> [ +0.000001] init_module_from_file+0x89/0xe0
> [ +0.000002] idempotent_init_module+0x11e/0x310
> [ +0.000002] __x64_sys_finit_module+0x5e/0xb0
> [ +0.000002] do_syscall_64+0x82/0x160
> [ +0.000002] ? syscall_exit_to_user_mode+0x10/0x200
> [ +0.000001] ? do_syscall_64+0x8e/0x160
> [ +0.000001] ? syscall_exit_to_user_mode+0x10/0x200
> [ +0.000001] ? do_syscall_64+0x8e/0x160
> [ +0.000001] ? do_sys_openat2+0x9c/0xe0
> [ +0.000002] ? syscall_exit_to_user_mode+0x10/0x200
> [ +0.000001] ? do_syscall_64+0x8e/0x160
> [ +0.000001] ? do_sys_openat2+0x9c/0xe0
> [ +0.000002] ? syscall_exit_to_user_mode+0x10/0x200
> [ +0.000001] ? do_syscall_64+0x8e/0x160
> [ +0.000001] ? exc_page_fault+0x7e/0x180
> [ +0.000001] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ +0.000002] RIP: 0033:0x75f6c18c71fd
> [ +0.000011] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e
> fa
> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08
> 0f
> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 fa 0c 00 f7 d8 64 89 01
> 48
> [ +0.000001] RSP: 002b:00007ffdec463948 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000139
> [ +0.000001] RAX: ffffffffffffffda RBX: 000057194ec1c5c0 RCX:
> 000075f6c18c71fd
> [ +0.000001] RDX: 0000000000000004 RSI: 000075f6c10ca05d RDI:
> 000000000000002b
> [ +0.000000] RBP: 00007ffdec463a00 R08: 0000000000000001 R09:
> 00007ffdec463990
> [ +0.000001] R10: 0000000000000040 R11: 0000000000000246 R12:
> 000075f6c10ca05d
> [ +0.000000] R13: 0000000000020000 R14: 000057194ec1dc40 R15:
> 000057194ec1e900
> [ +0.000001] </TASK>
> [ +0.000001] ---[ end trace 0000000000000000 ]---
> [ +0.000000] event xe_bo_move has unsafe dereference of argument 4
> [ +0.000001] print_fmt: "move_lacks_source:%s, migrate object %p
> [size
> %zu] from %s to %s device_id:%s", REC->move_lacks_source ? "yes" :
> "no", REC->bo, REC->size, xe_mem_type_to_name[REC->old_placement],
> xe_mem_type_to_name[REC->new_placement], __get_str(device_id)
>
1) rc5 has the same issue.
2) Result of git bisect starting with 6.12.6 is :
gene
git bisect good
683eccacc02d2eb25d1c34b8fb0363fcc7e08f64 is the first bad commit
commit 683eccacc02d2eb25d1c34b8fb0363fcc7e08f64
Author: Steven Rostedt <rostedt@xxxxxxxxxxx>
Date: Mon Dec 16 21:41:21 2024 -0500
tracing: Add "%s" check in test_event_printk()
commit 65a25d9f7ac02e0cf361356e834d1c71d36acca9 upstream.
The test_event_printk() code makes sure that when a trace event is
registered, any dereferenced pointers in from the event's
TP_printk() are
pointing to content in the ring buffer. But currently it does not
handle
"%s", as there's cases where the string pointer saved in the ring
buffer
points to a static string in the kernel that will never be freed.
As that
is a valid case, the pointer needs to be checked at runtime.
Currently the runtime check is done via trace_check_vprintf(), but
to not
have to replicate everything in vsnprintf() it does some logic with
the
va_list that may not be reliable across architectures. In order to
get rid
of that logic, more work in the test_event_printk() needs to be
done. Some
of the strings can be validated at this time when it is obvious the
string
is valid because the string will be saved in the ring buffer
content.
Do all the validation of strings in the ring buffer at boot in
test_event_printk(), and make sure that the field of the strings
that
point into the kernel are accessible. This will allow adding checks
at
runtime that will validate the fields themselves and not rely on
paring
the TP_printk() format at runtime.
Cc: stable@xxxxxxxxxxxxxxx
Cc: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
Cc: Mark Rutland <mark.rutland@xxxxxxx>
Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Link: https://lore.kernel.org/20241217024720.685917008@xxxxxxxxxxx
Fixes: 5013f454a352c ("tracing: Add check of trace event print fmts
for dereferencing pointers")
Signed-off-by: Steven Rostedt (Google) <rostedt@xxxxxxxxxxx>
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
kernel/trace/trace_events.c | 104
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
--------
1 file changed, 89 insertions(+), 15 deletions(-)
3) The git bisect log is:
git bisect start
# status: waiting for both good and bad commits
# good: [e9d65b48ce1aba50e9ec7eab6d9f73d1ba88420e] Linux 6.12.6
git bisect good e9d65b48ce1aba50e9ec7eab6d9f73d1ba88420e
# bad: [319addc2ad901dac4d6cc931d77ef35073e0942f] Linux 6.12.7
git bisect bad 319addc2ad901dac4d6cc931d77ef35073e0942f
# good: [ced29f2149c9b65ca01008ba64d9db993ad8fad1] USB: serial:
option: add Telit FE910C04 rmnet compositions
git bisect good ced29f2149c9b65ca01008ba64d9db993ad8fad1
# good: [b2f23a1c7181f8c562ea06d3ea14879334d6bdf4] drm/amdgpu/gfx12:
fix IP version check
git bisect good b2f23a1c7181f8c562ea06d3ea14879334d6bdf4
# bad: [2976661ed4de10bd1945e7cca0d7363b44f240dd] of/irq: Fix
interrupt-map cell length check in of_irq_parse_imap_parent()
git bisect bad 2976661ed4de10bd1945e7cca0d7363b44f240dd
# good: [3985cd30472a6a67bfe3f0b42ad23b59fa824033] tracing: Fix
test_event_printk() to process entire print argument
git bisect good 3985cd30472a6a67bfe3f0b42ad23b59fa824033
# bad: [416226eb3f3a3008456903fc3695b0efa8ceafa1] selftests/bpf: Use
asm constraint "m" for LoongArch
git bisect bad 416226eb3f3a3008456903fc3695b0efa8ceafa1
# bad: [683eccacc02d2eb25d1c34b8fb0363fcc7e08f64] tracing: Add "%s"
check in test_event_printk()
git bisect bad 683eccacc02d2eb25d1c34b8fb0363fcc7e08f64
# good: [71a613ceac1ad6235e164aea34abcddb258296af] tracing: Add
missing helper functions in event pointer dereference check
git bisect good 71a613ceac1ad6235e164aea34abcddb258296af
# first bad commit: [683eccacc02d2eb25d1c34b8fb0363fcc7e08f64]
tracing: Add "%s" check in test_event_printk()
--
Gene
Attachment:
signature.asc
Description: This is a digitally signed message part