Re: [RFC PATCH 00/11] bpf, trace, dtrace: DTrace BPF program type implementation and sample use

From: Alexei Starovoitov
Date: Wed May 22 2019 - 15:58:21 EST


On Wed, May 22, 2019 at 02:22:15PM -0400, Kris Van Hees wrote:
> On Wed, May 22, 2019 at 04:25:32PM +0200, Peter Zijlstra wrote:
> > On Tue, May 21, 2019 at 10:56:18AM -0700, Alexei Starovoitov wrote:
> >
> > > and no changes are necessary in kernel/events/ring_buffer.c either.
> >
> > Let me just NAK them on the principle that I don't see them in my inbox.
>
> My apologies for failing to include you on the Cc for the patches. That was
> an oversight on my end and certainly not intentional.
>
> > Let me further NAK it for adding all sorts of garbage to the code --
> > we're not going to do gaps and stay_in_page nonsense.
>
> Could you give some guidance in terms of an alternative? The ring buffer code
> provides both non-contiguous page allocation support and a vmalloc-based
> allocation, and the vmalloc version certainly would avoid the entire gap and
> page boundary stuff. But since the allocator is chosen at build time based on
> the arch capabilities, there is no way to select a specific memory allocator.
> I'd be happy to use an alternative approach that allows direct writing into
> the ring buffer.

You do not _need_ direct write from bpf prog.
dtrace language doesn't mandate direct write.
'direct write into ring buffer form bpf prog' is an interesting idea and
may be nice performance optimization, but in no way it's a blocker for dtrace scripts.
Also it's far from clear that it actually brings performance benefits.
Letting bpf progs write directly into ring buffer comes with
a lot of corner cases. It's something to carefully analyze.
I suggest to proceed with user space dtrace conversion to bpf
without introducing kernel changes.