Re: [PATCH v2 1/2] ring-buffer: Introducing ring-buffer mapping functions
From: Steven Rostedt
Date: Wed Mar 29 2023 - 11:33:22 EST
On Wed, 29 Mar 2023 14:55:41 +0100
Vincent Donnefort <vdonnefort@xxxxxxxxxx> wrote:
> > Yes, in fact it shouldn't need to call the ioctl until after it read it.
> >
> > Maybe, we should have the ioctl take a parameter of how much was read?
> > To prevent races?
>
> Races would only be with other consuming readers. In that case we'd probably
> have many other problems anyway as I suppose nothing would prevent another one
> of swapping the page while our userspace reader is still processing it?
I'm not worried about user space readers. I'm worried about writers, as
the ioctl will update the reader_page->read = reader_page->commit. The time
that the reader last read and stopped and then called the ioctl, a writer
could fill the page, then the ioctl may even swap the page. By passing in
the read amount, the ioctl will know if it needs to keep the same page or
not.
>
> I don't know if this is worth splitting the ABI between the meta-page and the
> ioctl parameters for this?
>
> Or maybe we should say the meta-page contains things modified by the writer and
> parameters modified by the reader are passed by the get_reader_page ioctl i.e.
> the reader page ID and cpu_buffer->reader_page->read? (for the hyp tracing, we
> have up to 4 registers for the HVC which would replace in our case the ioctl)
I don't think we need the reader_page id, as that should never move without
reader involvement. If there's more than one reader, that's up to the
readers to keep track of each other, not the kernel.
Which BTW, the more I look at doing this without ioctls, I think we may
need to update things slightly different.
I would keep the current approach, but for clarification of terminology, we
have:
meta_data - the data that holds information that is shared between user and
kernel space.
data_pages - this is a separate mapping that holds the mapped ring buffer
pages. In user space, this is one contiguous array and also holds
the reader page.
data_index - This is an array of what the writer sees. It maps the index
into data_pages[] of where to find the mapped pages. It does not
contain the reader page. We currently map this with the meta_data,
but that's not a requirement (although we may continue to do so).
I'm thinking that we make the data_index[] elements into a structure:
struct trace_map_data_index {
int idx; /* index into data_pages[] */
int cnt; /* counter updated by writer */
};
The cnt is initialized to zero when initially mapped.
Instead of having the bpage->id = index into data_pages[], have it equal
the index into data_index[].
The cpu_buffer->reader_page->id = -1;
meta_data->reader_page = index into data_pages[] of reader page
The swapping of the header page would look something like this:
static inline void
rb_meta_page_head_swap(struct ring_buffer_per_cpu *cpu_buffer)
{
struct ring_buffer_meta_page *meta = cpu_buffer->meta_page;
int head_page;
if (!READ_ONCE(cpu_buffer->mapped))
return;
head_page = meta->data_pages[meta->hdr.data_page_head];
meta->data_pages[meta->hdr.data_page_head] = meta->hdr.reader_page;
meta->hdr.reader_page = head_page;
meta->data_pages[head_page]->id = -1;
}
As hdr.data_page_head would be an index into data_index[] and not
data_pages[].
The fact that bpage->id points to the data_index[] and not the data_pages[]
means that the writer can easily get to that index, and modify the count.
That way, in rb_tail_page_update() (between cmpxchgs) we can do something
like:
if (cpu_buffer->mapped) {
meta = cpu_buffer->meta_page;
meta->data_index[next_page->id].cnt++;
}
And this will allow the reader to know if the current page it is on just
got overwritten by the writer, by doing:
prev_id = meta->data_index[this_page].cnt;
smp_rmb();
read event (copy it, whatever)
smp_rmb();
if (prev_id != meta->data_index[this_page].cnt)
/* read data may be corrupted, abort it */
Does this make sense?
-- Steve