Re: [GIT PULL] ftrace: Fixes for v6.13

From: Mathieu Desnoyers
Date: Sun Dec 15 2024 - 09:39:41 EST


On 2024-12-15 08:47, Steven Rostedt wrote:
On Sun, 15 Dec 2024 07:42:35 -0500
Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:

On 2024-12-15 05:05, Steven Rostedt wrote:
On Sat, 14 Dec 2024 21:19:01 -0800
Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

[...]


Just disable it unconditionally.

I can do that, but I'm not looking forward to seeing random crashes in the
trace event code again :-(

Honestly, I did not like this code when I wrote it, but I have no idea how
to stop the "%s" bug from happening before it gets out to production. This
worked. Do you have any suggestions for alternatives?

IMHO, deferred execution of TP_printk() code in kernel context is
a fundamental mistake causing all those problems. This opens the
door to store pointers to strings (or anything else really)
that sit in kernel modules which can be unloaded between

Module unloading will clear out the ring buffers to prevent issues.

As a side-effect issues caused by module unloading won't be
observable with tracing.


tracing and TP_printk() execution, or as we are seeing here
pointers to data which can be mapped at different addresses
across kernel reboot, into the ring buffer.

If TP_printk() don't have access to load data from random kernel
memory in the first place, and can only read from the buffer, we
would not be having those misuses, and there would be nothing to
work-around as the strings/data would all be serialized into the
ring buffer.

In LTTng we've taken the approach to only read the trace data
at post-processing from user-space (we don't have the equivalent
of TP_printk(), and that's on purpose).

I wonder if we could keep the ftrace trace_pipe pretty-printing
behavior, while isolating the TP_printk() execution into a
userspace process which would only map the ring buffer ? This way,

That would change the entire use of tracefs, especially in the embedded
world. Note, this hasn't been a major issue since the test/check logic was
put in place. It catches pretty much all issues with the delayed printing.

This is not at all what I have in mind, so let me rephrase.

What I am saying is: is there a way we could execute TP_printk()
in userspace mode _while preserving the trace_pipe tracefs ABI_ ?

I suspect that inserting this small userspace program into the
kernel image with objcopy would be a start. Then adapting the
usermode helper code to run a program from a preexisting
in-kernel copy could be a second step. Then modifying trace_pipe
so it blocks and communicates with this helper program to
consume the formatted output would come last.

Thanks,

Mathieu


-- Steve


users trying to misuse TP_printk() would get immediate feedback
about their mistake because they cannot print the trace. We could
print a dmesg warning about crash of a usermode helper program,
for instance.

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com