Re: [PATCH 1/8] perf: Allow to block process in syscall tracepoints

From: Peter Zijlstra
Date: Sat Dec 08 2018 - 05:44:42 EST


On Fri, Dec 07, 2018 at 03:14:33PM -0500, Steven Rostedt wrote:
> On Fri, 7 Dec 2018 16:11:05 +0100
> Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> > On Fri, Dec 07, 2018 at 08:41:18AM -0500, Steven Rostedt wrote:
> > > On Fri, 7 Dec 2018 09:58:39 +0100
> > > Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > >
> > > > These patches give no justification *what*so*ever* for why we're doing
> > > > ugly arse things like this. And why does this, whatever this is, need to
> > > > be done in perf?
> > > >
> > > > IOW, what problem are we solving ?
> > >
> > > I guess the cover letter should have had a link (or copy) of this:
> > >
> > > http://lkml.kernel.org/r/20181128134700.212ed035@xxxxxxxxxxxxxxxxxx
> >
> > That doesn't even begin to explain. Who cares about strace and why? And
> > why is it such a bad thing to loose the occasional record etc..
>
> Who cares about strace? Do I really need to answer that? It's one of
> the most used tools for seeing what a program is doing.

It's a tool I haven't used in years, given we have so many better tools
around these days.

> Why do we care about lost events? Because strace records *all* events,
> as that's what it does and that's what it always has done. It would be
> a break in functionality (a regression) if it were to start losing
> events. I use strace to see everything that an application is doing.

So make a new tool; break the expectation of all events. See if there's
anybody that really cares.

> When we discussed this at plumbers, Oracle people came to me and said
> how awesome it would be to run strace against their database accesses.
> The problem today is that strace causes such a large overhead that it
> isn't feasible to trace any high speed applications, especially if
> there are time restraints involved.

So have them run that perf thing acme pointed to.

So far nobody's made a good argument for why we cannot have LOST events.