Re: [PATCH tip 0/3] Improvements of scheduler related Tracepoints

From: Teng Qin
Date: Fri Dec 15 2017 - 03:54:32 EST




On 12/14/17, 23:40, "Peter Zijlstra" <peterz@xxxxxxxxxxxxx> wrote:
> On Thu, Dec 14, 2017 at 07:16:00PM -0800, Alexei Starovoitov wrote:
> > On 12/14/17 12:49 PM, Peter Zijlstra wrote:
> > > On Thu, Dec 14, 2017 at 12:20:41PM -0800, Teng Qin wrote:
> > > > This set of commits attempts to improve three scheduler related
> > > > Tracepoints: sched_switch, sched_process_fork, sched_process_exit.
> > > >
> > > > Firstly, these commit add additional flag values, namely preempt,
> > > > clone_flags and group_dead to these Tracepoints, to make information
> > > > exposed via the Tracepoints more useful and complete.
> > > >
> > > > Secondly, these commits exposes task_struct pointers in these
> > > > Tracepoints. The task_struct pointers are arguments of the Tracepoints
> > > > and currently only used to compute struct field values. But for BPF
> > > > programs attached to these Tracepoints, we may want to read additional
> > > > task information via the task_struct pointers. This is currently either
> > > > impossible, or we have to make assumption of whether the Tracepoint is
> > > > running from previous / parent or next / child, and use current pointer
> > > > instead. Exposing the task_struct pointers explicitly makes such use
> > > > case easier and more reliable.
> > > >
> > >
> > > NAK
> >
> > not sure what is the concern here.
> > Is it first or second part of the above ?
>
> Definitely the second, but also the first. You know I would have ripped
> out all scheduler tracepoints if I could have. They're a pain in the
> arse.
>
> A lot of people want to add to the tracepoints, with the end result that
> they'll end up a big bloated pile of useless crap. The first part is
> just the pieces you want added.
>
> As to the second, that's complete crap; that just makes everything
> slower for bodies benefit. If you register a traceprobe you already get
> access to these things.

To have access to related task_struct is one of the main purposes of these
patches. Take sched_switch as an example. We depend on the implementation
of the Tracepoint is called from prev or next (which could, although unlikedly,
change) and use current to get that task_struct, which feels, correct
me if I'm wrong, kind of defeating the purpose of Tracepoints being more
implementation-independent than kprobes. Then we have to figure out another
Tracepoint or most likely a kprobe function to get the other (prev or next)
task_struct.

> I think your problem is that you use perf to get access to the
> tracepoints, which them means you have to do disgusting things like
> this.