Re: [PATCH 2/2] ring-buffer: Fix a race between readers and resize checks
From: Steven Rostedt
Date: Mon May 27 2024 - 19:44:09 EST
On Mon, 27 May 2024 11:36:55 +0200
Petr Pavlu <petr.pavlu@xxxxxxxx> wrote:
> >> static void rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
> >> {
> >> @@ -2200,8 +2205,13 @@ int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size,
> >> */
> >> synchronize_rcu();
> >> for_each_buffer_cpu(buffer, cpu) {
> >> + unsigned long flags;
> >> +
> >> cpu_buffer = buffer->buffers[cpu];
> >> + raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
> >> rb_check_pages(cpu_buffer);
> >> + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock,
> >> + flags);
> >
> > Putting my RT hat on, I really don't like the above fix. The
> > rb_check_pages() iterates all subbuffers which makes the time interrupts
> > are disabled non-deterministic.
>
> I see, this applies also to the same rb_check_pages() validation invoked
> from ring_buffer_read_finish().
>
> >
> > Instead, I would rather have something where we disable readers while we do
> > the check, and re-enable them.
> >
> > raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
> > cpu_buffer->read_disabled++;
> > raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
> >
> > // Also, don't put flags on a new line. We are allow to go 100 characters now.
>
> Noted.
>
> >
> >
> > rb_check_pages(cpu_buffer);
> > raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
> > cpu_buffer->read_disabled--;
> > raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
> >
> > Or something like that. Yes, that also requires creating a new
> > "read_disabled" field in the ring_buffer_per_cpu code.
>
> I think this would work but I'm personally not immediately sold on this
> approach. If I understand the idea correctly, readers should then check
> whether cpu_buffer->read_disabled is set and bail out with some error if
> that is the case. The rb_check_pages() function is only a self-check
> code and as such, I feel it doesn't justify disrupting readers with
> a new error condition and adding more complex locking.
Honestly, this code was never made for more than one reader per
cpu_buffer. I'm perfectly fine if all check_pages() causes other
readers to the same per_cpu buffer to get -EBUSY.
Do you really see this being a problem? What use case is there for
hitting the check_pages() and reading the same cpu buffer at the same
time?
-- Steve