Re: [REGRESSION][BISECTED] Re: 6.12.7 stable new error: event xe_bo_move has unsafe dereference of argument 4

From: Steven Rostedt
Date: Mon Dec 30 2024 - 14:13:38 EST


On Mon, 30 Dec 2024 13:55:00 -0500
Genes Lists <lists@xxxxxxxxxxxx> wrote:

> On Fri, 2024-12-27 at 11:15 -0500, Genes Lists wrote:
> > I have not had a chance to bisect this yet but since its in stable
> > thought it best to share sooner than later.
> >
> > If noone spots anything, I will do a bisect soon as I can. Boot
> > completes and aside from this error things seem fine.
> >
> >  - Does not happen with 6.12.6.
> >  - Does happen with Mainline + patch for different drm bug
> >     mainline commit : d6ef8b40d075c425f548002d2f35ae3f06e9cf96
> >     Patch: https://patchwork.freedesktop.org/series/141911/
> >     Patch fixes drm bug reported here (patch not yet in mainline):
> >       
> > https://lore.kernel.org/lkml/3b097dddd7095bccabe6791b90899c689f271a35
> > .camel@xxxxxxxxxxxx/
> >
> > Graphics h/w:
> >   00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-P
> > [Iris Xe Graphics] (rev 04)
> >
> > Trace below and full dmesg attached.
> >
> > gene
> >
> > [  +0.143567] ------------[ cut here ]------------
> > [  +0.000003] WARNING: CPU: 6 PID: 344 at
> > kernel/trace/trace_events.c:577 trace_event_raw_init+0x159/0x660
> > [  +0.000005] Modules linked in: xe(+) drm_ttm_helper gpu_sched
> > drm_suballoc_helper drm_gpuvm drm_exec typec_displayport usbhid
> > hid_sensor_custom hid_sensor_hub intel_ishtp_hid nvme
> > spi_pxa2xx_platform hid_multitouch dw_dmac nvme_core 8250_dw
> > spi_pxa2xx_core i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit
> > crc32c_intel psmouse polyval_clmulni serio_raw polyval_generic
> > drm_buddy atkbd ghash_clmulni_intel libps2 vivaldi_fmap ttm
> > sha512_ssse3 sha256_ssse3 intel_gtt sha1_ssse3 aesni_intel gf128mul
> > drm_display_helper ucsi_acpi crypto_simd typec_ucsi intel_lpss_pci
> > intel_ish_ipc cryptd intel_lpss roles intel_ishtp idma64 cec vmd
> > typec
> > i2c_hid_acpi i8042 video i2c_hid serio wmi pkcs8_key_parser wireguard
> > curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64
> > ip6_udp_tunnel udp_tunnel libcurve25519_generic libchacha loop sg
> > crypto_user
> > [  +0.000022] CPU: 6 UID: 0 PID: 344 Comm: (udev-worker) Not tainted
> > 6.12.7-stable-1 #16 8ef5eca39d52f9e2c5a50ace761c4a6858369a18
> > [  +0.000003] Hardware name: Dell Inc. XPS 9320/0CR6NC, BIOS 2.16.1
> > 09/11/2024
> > [  +0.000001] RIP: 0010:trace_event_raw_init+0x159/0x660
> > [  +0.000002] Code: 89 ea 0f 83 3b 04 00 00 e8 84 db ff ff 84 c0 74
> > 10
> > 8b 0c 24 48 c7 c0 fe ff ff ff 48 d3 c0 49 21 c6 4d 85 f6 0f 84 d6 fe
> > ff
> > ff <0f> 0b bb 01 00 00 00 41 f6 c6 01 0f 85 f3 78 bf 00 66 0f 1f 44
> > 00
> > [  +0.000001] RSP: 0018:ffffa3c8007bb980 EFLAGS: 00010206
> > [  +0.000001] RAX: ffffffffffffffdf RBX: ffffffffc0ef8731 RCX:
> > 0000000000000005
> > [  +0.000001] RDX: 0000000000000002 RSI: 0000000000000001 RDI:
> > ffffffffc0ef8727
> > [  +0.000001] RBP: ffffffffc0ef8640 R08: 0000000000000039 R09:
> > 0000000000000000
> > [  +0.000000] R10: 0000000000000076 R11: 000000000000004e R12:
> > 00000000000000f2
> > [  +0.000001] R13: ffffffffc0ef9760 R14: 0000000000000018 R15:
> > 0000000000000000
> > [  +0.000001] FS:  000075f6c10d0880(0000) GS:ffff96b92f500000(0000)
> > knlGS:0000000000000000
> > [  +0.000001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  +0.000000] CR2: 00007be641f52f70 CR3: 000000010cffe000 CR4:
> > 0000000000f50ef0
> > [  +0.000001] PKRU: 55555554
> > [  +0.000001] Call Trace:
> > [  +0.000001]  <TASK>
> > [  +0.000000]  ? trace_event_raw_init+0x159/0x660
> > [  +0.000002]  ? __warn.cold+0x93/0xf6
> > [  +0.000001]  ? trace_event_raw_init+0x159/0x660
> > [  +0.000003]  ? report_bug+0xff/0x140
> > [  +0.000002]  ? handle_bug+0x58/0x90
> > [  +0.000001]  ? exc_invalid_op+0x17/0x70
> > [  +0.000002]  ? asm_exc_invalid_op+0x1a/0x20
> > [  +0.000003]  ? trace_event_raw_init+0x159/0x660
> > [  +0.000002]  event_init+0x28/0x70
> > [  +0.000002]  trace_module_notify+0x19a/0x250
> > [  +0.000001]  notifier_call_chain+0x5a/0xd0
> > [  +0.000002]  blocking_notifier_call_chain_robust+0x65/0xc0
> > [  +0.000002]  load_module+0x1818/0x1ce0
> > [  +0.000004]  ? init_module_from_file+0x89/0xe0
> > [  +0.000001]  init_module_from_file+0x89/0xe0
> > [  +0.000002]  idempotent_init_module+0x11e/0x310
> > [  +0.000002]  __x64_sys_finit_module+0x5e/0xb0
> > [  +0.000002]  do_syscall_64+0x82/0x160
> > [  +0.000002]  ? syscall_exit_to_user_mode+0x10/0x200
> > [  +0.000001]  ? do_syscall_64+0x8e/0x160
> > [  +0.000001]  ? syscall_exit_to_user_mode+0x10/0x200
> > [  +0.000001]  ? do_syscall_64+0x8e/0x160
> > [  +0.000001]  ? do_sys_openat2+0x9c/0xe0
> > [  +0.000002]  ? syscall_exit_to_user_mode+0x10/0x200
> > [  +0.000001]  ? do_syscall_64+0x8e/0x160
> > [  +0.000001]  ? do_sys_openat2+0x9c/0xe0
> > [  +0.000002]  ? syscall_exit_to_user_mode+0x10/0x200
> > [  +0.000001]  ? do_syscall_64+0x8e/0x160
> > [  +0.000001]  ? exc_page_fault+0x7e/0x180
> > [  +0.000001]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [  +0.000002] RIP: 0033:0x75f6c18c71fd
> > [  +0.000011] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e
> > fa
> > 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08
> > 0f
> > 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 fa 0c 00 f7 d8 64 89 01
> > 48
> > [  +0.000001] RSP: 002b:00007ffdec463948 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000139
> > [  +0.000001] RAX: ffffffffffffffda RBX: 000057194ec1c5c0 RCX:
> > 000075f6c18c71fd
> > [  +0.000001] RDX: 0000000000000004 RSI: 000075f6c10ca05d RDI:
> > 000000000000002b
> > [  +0.000000] RBP: 00007ffdec463a00 R08: 0000000000000001 R09:
> > 00007ffdec463990
> > [  +0.000001] R10: 0000000000000040 R11: 0000000000000246 R12:
> > 000075f6c10ca05d
> > [  +0.000000] R13: 0000000000020000 R14: 000057194ec1dc40 R15:
> > 000057194ec1e900
> > [  +0.000001]  </TASK>
> > [  +0.000001] ---[ end trace 0000000000000000 ]---
> > [  +0.000000] event xe_bo_move has unsafe dereference of argument 4
> > [  +0.000001] print_fmt: "move_lacks_source:%s, migrate object %p
> > [size
> > %zu] from %s to %s device_id:%s", REC->move_lacks_source ? "yes" :
> > "no", REC->bo, REC->size, xe_mem_type_to_name[REC->old_placement],
> > xe_mem_type_to_name[REC->new_placement], __get_str(device_id)
> >

Yep. That's a false positive :-(

We stopped doing runtime checks of '%s' as it required using va_list
"hacks", so the checks have been moved to the boot up code. The above has:

TRACE_EVENT(xe_bo_move,
TP_PROTO(struct xe_bo *bo, uint32_t new_placement, uint32_t old_placement,
bool move_lacks_source),
TP_ARGS(bo, new_placement, old_placement, move_lacks_source),
TP_STRUCT__entry(
__field(struct xe_bo *, bo)
__field(size_t, size)
__field(u32, new_placement)
__field(u32, old_placement)
__string(device_id, __dev_name_bo(bo))
__field(bool, move_lacks_source)
),

TP_fast_assign(
__entry->bo = bo;
__entry->size = bo->size;
__entry->new_placement = new_placement;
__entry->old_placement = old_placement;
__assign_str(device_id);
__entry->move_lacks_source = move_lacks_source;
),
TP_printk("move_lacks_source:%s, migrate object %p [size %zu] from %s to %s device_id:%s",
__entry->move_lacks_source ? "yes" : "no", __entry->bo, __entry->size,
xe_mem_type_to_name[__entry->old_placement],
xe_mem_type_to_name[__entry->new_placement], __get_str(device_id))

Where the xe_mem_type_to_name is an array of strings, that I assume
will always be there for the file of the module.

I guess the "fix" would be to have the check code ignore pointer to
arrays, assuming they are "ok".

The check is mostly to catch the naive use of:

TP_fast_assign(
__entry->str = passed_in_string;
)

TP_printk("str=%s", __entry->str)

Which was 99% of the bugs that this would catch. I would think if we have:

TP_fast_assign(
__entry->idx = passed_in_idx;
)

TP_printk("str=%s", some_array[__entry->idx])

Would 99% be OK. For the 1% it's not, oh well, that would be a false
negative. Still better than not having this check.

-- Steve


>
> 1) rc5 has the same issue.
>
> 2) Result of git bisect starting with 6.12.6 is :
>
> gene
>
>
> git bisect good
> 683eccacc02d2eb25d1c34b8fb0363fcc7e08f64 is the first bad commit
> commit 683eccacc02d2eb25d1c34b8fb0363fcc7e08f64
> Author: Steven Rostedt <rostedt@xxxxxxxxxxx>
> Date: Mon Dec 16 21:41:21 2024 -0500
>
> tracing: Add "%s" check in test_event_printk()
>
> commit 65a25d9f7ac02e0cf361356e834d1c71d36acca9 upstream.
>
> The test_event_printk() code makes sure that when a trace event is
> registered, any dereferenced pointers in from the event's
> TP_printk() are
> pointing to content in the ring buffer. But currently it does not
> handle
> "%s", as there's cases where the string pointer saved in the ring
> buffer
> points to a static string in the kernel that will never be freed.
> As that
> is a valid case, the pointer needs to be checked at runtime.
>
> Currently the runtime check is done via trace_check_vprintf(), but
> to not
> have to replicate everything in vsnprintf() it does some logic with
> the
> va_list that may not be reliable across architectures. In order to
> get rid
> of that logic, more work in the test_event_printk() needs to be
> done. Some
> of the strings can be validated at this time when it is obvious the
> string
> is valid because the string will be saved in the ring buffer
> content.
>
> Do all the validation of strings in the ring buffer at boot in
> test_event_printk(), and make sure that the field of the strings
> that
> point into the kernel are accessible. This will allow adding checks
> at
> runtime that will validate the fields themselves and not rely on
> paring
> the TP_printk() format at runtime.
>
> Cc: stable@xxxxxxxxxxxxxxx
> Cc: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
> Cc: Mark Rutland <mark.rutland@xxxxxxx>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Link: https://lore.kernel.org/20241217024720.685917008@xxxxxxxxxxx
> Fixes: 5013f454a352c ("tracing: Add check of trace event print fmts
> for dereferencing pointers")
> Signed-off-by: Steven Rostedt (Google) <rostedt@xxxxxxxxxxx>
> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
>
> kernel/trace/trace_events.c | 104
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
> --------
> 1 file changed, 89 insertions(+), 15 deletions(-)
>
> 3) The git bisect log is:
>
> git bisect start
> # status: waiting for both good and bad commits
> # good: [e9d65b48ce1aba50e9ec7eab6d9f73d1ba88420e] Linux 6.12.6
> git bisect good e9d65b48ce1aba50e9ec7eab6d9f73d1ba88420e
> # bad: [319addc2ad901dac4d6cc931d77ef35073e0942f] Linux 6.12.7
> git bisect bad 319addc2ad901dac4d6cc931d77ef35073e0942f
> # good: [ced29f2149c9b65ca01008ba64d9db993ad8fad1] USB: serial:
> option: add Telit FE910C04 rmnet compositions
> git bisect good ced29f2149c9b65ca01008ba64d9db993ad8fad1
> # good: [b2f23a1c7181f8c562ea06d3ea14879334d6bdf4] drm/amdgpu/gfx12:
> fix IP version check
> git bisect good b2f23a1c7181f8c562ea06d3ea14879334d6bdf4
> # bad: [2976661ed4de10bd1945e7cca0d7363b44f240dd] of/irq: Fix
> interrupt-map cell length check in of_irq_parse_imap_parent()
> git bisect bad 2976661ed4de10bd1945e7cca0d7363b44f240dd
> # good: [3985cd30472a6a67bfe3f0b42ad23b59fa824033] tracing: Fix
> test_event_printk() to process entire print argument
> git bisect good 3985cd30472a6a67bfe3f0b42ad23b59fa824033
> # bad: [416226eb3f3a3008456903fc3695b0efa8ceafa1] selftests/bpf: Use
> asm constraint "m" for LoongArch
> git bisect bad 416226eb3f3a3008456903fc3695b0efa8ceafa1
> # bad: [683eccacc02d2eb25d1c34b8fb0363fcc7e08f64] tracing: Add "%s"
> check in test_event_printk()
> git bisect bad 683eccacc02d2eb25d1c34b8fb0363fcc7e08f64
> # good: [71a613ceac1ad6235e164aea34abcddb258296af] tracing: Add
> missing helper functions in event pointer dereference check
> git bisect good 71a613ceac1ad6235e164aea34abcddb258296af
> # first bad commit: [683eccacc02d2eb25d1c34b8fb0363fcc7e08f64]
> tracing: Add "%s" check in test_event_printk()
>
>