Re: [PATCH v2 1/2] ring-buffer: Introducing ring-buffer mapping functions

From: Vincent Donnefort
Date: Thu Mar 30 2023 - 06:31:20 EST


On Wed, Mar 29, 2023 at 11:32:34AM -0400, Steven Rostedt wrote:
> On Wed, 29 Mar 2023 14:55:41 +0100
> Vincent Donnefort <vdonnefort@xxxxxxxxxx> wrote:
>
> > > Yes, in fact it shouldn't need to call the ioctl until after it read it.
> > >
> > > Maybe, we should have the ioctl take a parameter of how much was read?
> > > To prevent races?
> >
> > Races would only be with other consuming readers. In that case we'd probably
> > have many other problems anyway as I suppose nothing would prevent another one
> > of swapping the page while our userspace reader is still processing it?
>
> I'm not worried about user space readers. I'm worried about writers, as
> the ioctl will update the reader_page->read = reader_page->commit. The time
> that the reader last read and stopped and then called the ioctl, a writer
> could fill the page, then the ioctl may even swap the page. By passing in
> the read amount, the ioctl will know if it needs to keep the same page or
> not.

How about?

userspace:

prev_read = meta->read;
ioctl(TRACE_MMAP_IOCTL_GET_READER_PAGE)

kernel:
ring_buffer_get_reader_page()
rb_get_reader_page(cpu_buffer);
cpu_buffer->reader_page->read = rb_page_size(reader);
meta->read = cpu_buffer->reader_page->read;

userspace:
/* if new page prev_read = 0 */
/* read between prev_read and meta->read */

If the writer does anything in-between, wouldn't rb_get_reader_page() handle it
nicely by returning the same reader as more would be there to read?

It is similar to rb_advance_reader() except we'd be moving several events at
once?

>
> >
> > I don't know if this is worth splitting the ABI between the meta-page and the
> > ioctl parameters for this?
> >
> > Or maybe we should say the meta-page contains things modified by the writer and
> > parameters modified by the reader are passed by the get_reader_page ioctl i.e.
> > the reader page ID and cpu_buffer->reader_page->read? (for the hyp tracing, we
> > have up to 4 registers for the HVC which would replace in our case the ioctl)
>
> I don't think we need the reader_page id, as that should never move without
> reader involvement. If there's more than one reader, that's up to the
> readers to keep track of each other, not the kernel.
>
> Which BTW, the more I look at doing this without ioctls, I think we may
> need to update things slightly different.
>
> I would keep the current approach, but for clarification of terminology, we
> have:
>
> meta_data - the data that holds information that is shared between user and
> kernel space.
>
> data_pages - this is a separate mapping that holds the mapped ring buffer
> pages. In user space, this is one contiguous array and also holds
> the reader page.
>
> data_index - This is an array of what the writer sees. It maps the index
> into data_pages[] of where to find the mapped pages. It does not
> contain the reader page. We currently map this with the meta_data,
> but that's not a requirement (although we may continue to do so).
>
> I'm thinking that we make the data_index[] elements into a structure:
>
> struct trace_map_data_index {
> int idx; /* index into data_pages[] */
> int cnt; /* counter updated by writer */
> };
>
> The cnt is initialized to zero when initially mapped.
>
> Instead of having the bpage->id = index into data_pages[], have it equal
> the index into data_index[].
>
> The cpu_buffer->reader_page->id = -1;
>
> meta_data->reader_page = index into data_pages[] of reader page
>
> The swapping of the header page would look something like this:
>
> static inline void
> rb_meta_page_head_swap(struct ring_buffer_per_cpu *cpu_buffer)
> {
> struct ring_buffer_meta_page *meta = cpu_buffer->meta_page;
> int head_page;
>
> if (!READ_ONCE(cpu_buffer->mapped))
> return;
>
> head_page = meta->data_pages[meta->hdr.data_page_head];
> meta->data_pages[meta->hdr.data_page_head] = meta->hdr.reader_page;
> meta->hdr.reader_page = head_page;
> meta->data_pages[head_page]->id = -1;
> }
>
> As hdr.data_page_head would be an index into data_index[] and not
> data_pages[].
>
> The fact that bpage->id points to the data_index[] and not the data_pages[]
> means that the writer can easily get to that index, and modify the count.
> That way, in rb_tail_page_update() (between cmpxchgs) we can do something
> like:
>
> if (cpu_buffer->mapped) {
> meta = cpu_buffer->meta_page;
> meta->data_index[next_page->id].cnt++;
> }
>
> And this will allow the reader to know if the current page it is on just
> got overwritten by the writer, by doing:
>
> prev_id = meta->data_index[this_page].cnt;
> smp_rmb();
> read event (copy it, whatever)
> smp_rmb();
> if (prev_id != meta->data_index[this_page].cnt)
> /* read data may be corrupted, abort it */

Couldn't the reader just check for the page commit field? rb_iter_head_event()
does something like this to check if the writer is on its page.

>
>
> Does this make sense?
>
> -- Steve