Re: [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements
From: Andy Chiu
Date: Sun Dec 29 2024 - 14:09:57 EST
Steven Rostedt <rostedt@xxxxxxxxxxx> 於 2024年12月24日 週二 上午11:15寫道:
>
> On Wed, 27 Nov 2024 22:25:57 +0100
> Björn Töpel <bjorn@xxxxxxxxxx> wrote:
>
> > Adding Steven.
>
> And this has been in my draft folder for almost a month :-p
>
> I kept coming to this email, but then got distracted by something else.
>
> >
> > Andy Chiu <andybnac@xxxxxxxxx> writes:
> >
> > > This series makes atmoic code patching possible in riscv ftrace. A
> > > direct benefit of this is that we can get rid of stop_machine() when
> > > patching function entries. This also makes it possible to run ftrace
> > > with full kernel preemption. Before this series, the kernel initializes
> > > patchable function entries to NOP4 + NOP4. To start tracing, it updates
> > > entries to AUIPC + JALR while holding other cores in stop_machine.
> > > stop_machine() is required because it is impossible to update 2
> > > instructions, and be seen atomically. And preemption must have to be
> > > prevented, as kernel preemption allows process to be scheduled out while
> > > executing on one of these instruction pairs.
> > >
> > > This series addresses the problem by initializing the first NOP4 to
> > > AUIPC. So, atmoic patching is possible because the kernel only has to
> > > update one instruction. As long as the instruction is naturally aligned,
> > > then it is expected to be updated atomically.
> > >
> > > However, the address range of the ftrace trampoline is limited to +-2K
> > > from ftrace_caller after appplying this series. This issue is expected
> > > to be solved by Puranjay's CALL_OPS, where it adds 8B naturally align
> > > data in front of pacthable functions and can use it to direct execution
> > > out to any custom trampolines.
> > >
> > > The series is composed by three parts. The first part cleans up the
> > > existing issues when the kernel is compiled with clang.The second part
> > > modifies the ftrace code patching mechanism (2-4) as mentioned above.
> > > Then prepare ftrace to be able to run with kernel preemption (5,6)
> > >
> > > An ongoing fix:
> > >
> > > Since there is no room for marking *kernel_text_address as notrace[1] at
> > > source code level, there is a significant performance regression when
> > > using function_graph with TRACE_IRQFLAGS enabled. There can be as much as
> > > 8 graph handler being called in each function-entry. The current
> > > workaround requires us echo "*kernel_text_address" into
> > > set_ftrace_notrace before starting the trace. However, we observed that
> > > the kernel still enables the patch site in some cases even with
> > > *kernel_text_address properly added in the file While the root cause is
> > > still under investagtion, we consider that it should not be the reason
> > > for holding back the code patching, in order to unblock the call_ops
> > > part.
> >
> > Maybe Steven knows this from the top of his head!
> >
> > As Andy points out, "*kernel_text_address" is used in the stack
> > unwinding on RISC-V. So, if you do a tracing without filtering *and*
> > TRACE_IRQFLAGS, one will drown in traces.
>
> I tested set_ftrace_notrace on x86 and the function graph tracer does honor
> it. I wonder if there's a kernel
After checking the log buffer, I can confirm that riscv does also
honor set_ftrace_notrace. It does not print out "*kernel_text_address"
in the trace buffer. Sorry for not making this clear.
However, the problem is that the patch sites "*kernel_text_address"
were enabled, allowing the code flow into ftrace and parts of fgraph.
In particular, with TRACE_IRQFLAGS enabled, the functions are called
extensively, causing the significant slow down.
IIUC, it is reasonable to enable a patch site if there is at least one
tracing instance that request the function. However, in another
experiment, the patch sites of "*kernel_text_address" are enabled
even when all instances have them disabled. Here is a way to reproduce
it:
cd /sys/kernel/
mount -t tracefs t tracing/
cd tracing/
mkdir instances/1
mkdir instances/2
cd instances/1/
echo *kernel_text_address > set_ftrace_notrace
echo *sched* > set_ftrace_filter # this is just to make the kernel
more responsive
cat set_ftrace_notrace
echo function_graph > current_tracer
cd ../2/
echo *kernel_text_address > set_ftrace_notrace
cat set_ftrace_notrace
echo function_graph > current_tracer
To figure out this problem, it is probably worth trying to understand
why the patch site is enabled after the above sequence:
It seems like the ftrace_ops being passed into
__ftrace_hash_rec_update() at the start of the second tracer has an
empty notrace hash. This leads to the eventual activation of the patch
sites. However, graph_ops does not have the empty notrace hash when
the user starts the first graph tracer. I am trying to understand this
part of the code, and the relationship between graph_ops and ops in
the fgraph_array.
>
> >
> > E.g. the ftrace selftest:
> > | $ ./ftracetest -vvv test.d/ftrace/fgraph-multi.tc
> >
> > will generate a lot of traces.
> >
> > Now, if we add:
> > --8<--
> > diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> > index ff88f97e41fb..4f30a4d81d99 100644
> > --- a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> > +++ b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> > @@ -84,6 +84,7 @@ cd $INSTANCE2
> > do_test '*rcu*' 'rcu'
> > cd $WD
> > cd $INSTANCE3
> > +echo '*kernel_text_address' > set_ftrace_notrace
> > echo function_graph > current_tracer
> >
> > sleep 1
> > -->8--
> >
> > The graph tracer will not honor the "set_ftrace_notrace" in $INSTANCE3,
> > but still enable the *kernel_text_address traces. (Note that there are
> > no filters in the test, so *all* ftrace recs will be enabled.)
> >
> > Are we holding the graph tracer wrong?
>
> What do you get when you do:
>
> # grep kernel_text_address available_filter_functions
>
> ?
We get:
kernel_text_address
__kernel_text_address
>
> >
> >
> > Happy thanksgiving!
> > Björn
>
> And Merry Christmas!
>
> -- Steve
>
And Happy New Year!
Andy