Re: [patch 1/2] x86_64 page fault NMI-safe

From: Ingo Molnar
Date: Tue Aug 03 2010 - 16:21:57 EST



* Ingo Molnar <mingo@xxxxxxx> wrote:

>
> * Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> > On Tue, Aug 3, 2010 at 12:45 PM, Mathieu Desnoyers
> > <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
> > >
> > > The real issue here, IMHO, is that Perf has tied gory ring buffer
> > > implementation details to the userspace perf ABI, and there is now strong
> > > unwillingness from Perf developers to break this ABI.
>
> (Wrong.)
>
> > The thing is - I think my outlined buffer fragmentation model would work
> > fine with the perf ABI too. Exactly because there is no deep structure,
> > just the same "stream of small events" both from a kernel and a user model
> > standpoint. Sure, the stream would now contain a new event type, but that's
> > trivial. It would still be _entirely_ reasonable to have the actual data in
> > the exact same ring buffer, including the whole mmap'ed area.
>
> Yeah.
>
> > Of course, when user space actually parses it, user space would have to
> > eventually defragment the event by allocating a new area and copying the
> > fragments together in the right order, but that's pretty trivial to do. It
> > certainly doesn't affect the current mmap'ed interface in the least.
> >
> > Now, whether the perf people feel they want that kind of functionality, I
> > don't know. It's possible that they simply do not want to handle events that
> > are complex enough that they would have arbitrary size.
>
> Looks useful. There's a steady trickle of new events and we already use type
> encapsulation for things like trace events - which are only made sense of
> later on in user-space.
>
> We may want to add things like a NOP event to pad out the end of page

/me once again experiences the subtle difference between 'Y' and 'N' when postponing a mail

So adding fragments would be possible as well. We've got the space for such
extensions in the ABI and the basic model of streaming information is not
affected.

[ The control structure of the mmap area is there for performance/wakeup
optimizations (and to allow the kernel to lose information on producer
overload, while still giving user-space an idea that we lost data and how
much) - it does not affect semantics and does not limit us. ]

So there's no design limitation - Peter simply prefers one possible solution
over another and outlined his reasons - we should hash that out based on the
technical arguments.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/