Re: [PATCH 1/2] tracing: Simplify and fix "buffered event" synchronization

From: Petr Pavlu
Date: Fri Dec 01 2023 - 09:17:43 EST


On 11/29/23 15:58, Steven Rostedt wrote:
> On Wed, 29 Nov 2023 10:22:23 +0100
> Petr Pavlu <petr.pavlu@xxxxxxxx> wrote:
>
>> Yes, I believe this should address this potential race condition.
>>
>> An alternative would be instead to update
>> trace_event_buffer_lock_reserve() as follows:
>>
>> if (this_cpu_inc_return(trace_buffered_event_cnt) == 1) {
>> smp_rmb();
>
> This is the problem I have with your approach. That smp_rmb() is in the
> highly critical path. On some architectures, this has a significant impact
> on the overhead of this code. This path is called during code execution and
> increases the overhead of the tracing infrastructure.
>
> If I'm given two solutions where one adds a smp_rmb() to the critical path
> and the other just slows down the non-critical path more, I'll take the
> slow down of non-critical path every time.
>
>> if ((entry = __this_cpu_read(trace_buffered_event))) {
>> [...]
>>
>> That saves the synchronize_rcu() call but additionally modifies
>> trace_buffered_event_cnt even if trace_buffered_event is currently NULL.
>>
>> Another alternative is the approach taken by my patch which avoids more
>> RCU work and unnecessary memory modifications.
>>
>> I'd be interested if you could have a look again at what I'm proposing
>> in my patch. I think it simplifies the code while addressing these
>> problems as well. However, if you have reservations about that approach
>> then it is ok, I can fix the found problems individually as discussed.
>
> Fix this without adding any memory barriers to the critical path, then I'll
> take another look.
>
> FYI, this code was designed in the first place to avoid adding memory
> barriers in the critical path.

Thank you for the explanation. I ran the tests you mentioned in the
description of commit 0fc1b09ff1ff ("tracing: Use temp buffer when
filtering events") to understand this aspect a bit more. I confirm that
my proposed patch makes the tracing slower, usually by single digit
percentages. I understand this is not welcome. I also realize that the
slowdown might be even worse in different situations and on other
architectures that I checked (arm64, x86_64).

Ok, keeping the current approach, my plan for v2 is to prepare the
following patches:

* Fix for the missing increment+decrement of trace_buffered_event_cnt
on the current CPU in trace_buffered_event_disable().

Replace smp_call_function_many() with on_each_cpu_mask() in
trace_buffered_event_disable(). The on_each_cpu_mask() function has
also an advantage that it itself disables preemption so doing that can
be then removed from trace_buffered_event_disable().

* Fix the potential race between trace_buffered_event_enable() and
trace_event_buffer_lock_reserve() where the latter might already see
a valid trace_buffered_event pointer but not all initialization yet.

I think this might be actually best to address by using the same
maintenance exclusion as is implemented in
trace_buffered_event_disable(). It would make both maintenance
operations consistent but for the cost of making the enable operation
somewhat slower.

* Fix the WARN_ON_ONCE(!trace_buffered_event_ref) issued in
trace_buffered_event_disable() when trace_buffered_event_enable()
previously fails.

Add a variable/flag tracking whether trace_buffered_event is currently
allocated and use that for driving if a new allocation needs to be
done when trace_buffered_event_enable() is called, or the buffers
should be really freed when trace_buffered_event_disable() is invoked.

Not sure if the mentioned alternative of leaving trace_buffered_event
partially initialized on failure is preferred instead.

* Fix the potential race between trace_buffered_event_disable() and
trace_event_buffer_lock_reserve() where the latter might still grab
a pointer from trace_buffered_event that is being freed.

Replace smp_wmb() with synchronize_rcu() in
trace_buffered_event_disable().

Thanks,
Petr