Re: [GIT PULL] ring-buffer: Updates for v6.15
From: Steven Rostedt
Date: Fri Mar 28 2025 - 00:00:55 EST
Adding Vincent who did the user space memory mapping code, and is
working on having pKVM hyperviser mapping with this as well.
Some background too. The user space memory mapping work came out of a
discussion between Vincent and I at the tracing summit in 2023. Where
pKVM adds a hypervisor context that is not the kernel. It acts as a go
between of the kernel (host) and the guest, were the kernel does not
have access to the guest memory, nor to the hypervisor memory. But just
like user space, memory between the kernel and the hypervisor can be
mapped to each.
As the hypervisor is a black box, Vincent was looking for a way to add
tracing to it where the kernel can't manipulate the pages but can read
the hypervisor trace content. When discussing this, we found that this
had the same requirements as memory mapping the buffer with user space,
and since that was a long time request for tooling that only read the
buffer and never saved it, we decided to do the work with user space
and eventual move it to the pKVM hypervisor.
The code is going to be split up a bit to allow the hypervisor to use
some of it. That was destined for 6.16 as well.
Since Vincent has working code for pKVM, and this may change, not to
mention that he's the original author of this code, it would be good
for him to see if he can also clean up this code in the process ;-)
Vincent,
The code for memory mapping needs to be cleaned up a bit. Namely, the
fact that we have two ways to map the ring buffer (one using the normal
kernel allocation the other with physically mapped memory). Ideally we
should keep track of the pages a bit better. I can fix up the
physically mapped memory part, but I'm not sure how this is going to
affect the pKVM changes you are making.
-- Steve
On Thu, 27 Mar 2025 20:36:00 -0700
Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Thu, 27 Mar 2025 at 20:19, Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Thu, 27 Mar 2025 at 20:01, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> > >
> > > Are you OK with the idea of moving the mapping code into the ring
> > > buffer code (which I think is cleaner) and then keeping track of that
> > > to get to the pages with the helper function?
> > >
> > > struct page *rb_get_page(struct trace_buffer *buffer, unsigned long addr)
> > > {
> > > if (buffer->flags & RB_FL_PHYSICAL) {
> > > addr -= buffer->vmap_start;
> > > addr += buffer->phys_start;
> > > return pfn_to_page(addr >> PAGE_SHIFT);
> > > }
> > > return virt_to_page(addr);
> > > }
> >
> > I think that would have been *enormously* better, yes
>
> .. actually, thinking a bit more about it, I think you want another
> level of abstraction to deal with the chunking side (ie I think you
> want to make the subbuf_ids[] thing explicit and the 'page order' as
> well).
>
> So you have that 'meta_buffer' thing for the first page, but you also
> have that subbuf_ids[] array with pages chunked by 'order'), and I
> think you need another level of abstraction for those.
>
> IOW, you'd have 'rb_get_meta_buffer_page()' for the first page, and
> then something like 'rb_get_buffer_page()' for the subbuf_id[] thing.
>
> Or something in that direction. I didn't look *too* closely at what
> the code actually needs, I'm just trying to make sure that you don't
> pass random kernel addresses around, but something more structured.
>
> Preferably maybe just the page index *within* the ring buffer,
> perhaps, and have the helper function do the whole "is this the meta
> buffer or one of the subbuf_id things"?
>
> Hmm?
>
> More clear abstraction is what I'm looking for, in other words.
>
> Linus