Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support

From: Masami Hiramatsu
Date: Thu Feb 09 2017 - 09:32:11 EST


Hi Tom,

On Wed, 08 Feb 2017 19:14:22 -0600
Tom Zanussi <tom.zanussi@xxxxxxxxxxxxxxx> wrote:

> > > I'm submitting the patchset (based on tracing/for-next) as an RFC not
> > > only to get comments, but because there are still some problems I
> > > haven't fixed yet...
> > >
> > > Here are some examples that should make things less abstract.
> > >
> > > ====
> > > Example - wakeup latency
> > > ====
> > >
> > > This basically implements the -RT latency_hist 'wakeup_latency'
> > > histogram using the synthetic events, variables, and actions
> > > described. The output below is from a run of cyclictest using the
> > > following command:
> > >
> > > # rt-tests/cyclictest -p 80 -n -s -t 2
> > >
> > > What we're measuring the latency of is the time between when a
> > > thread (of cyclictest) is awakened and when it's scheduled in. To
> > > do that we add triggers to sched_wakeup and sched_switch with the
> > > appropriate variables, and on a matching sched_switch event,
> > > generate a synthetic 'wakeup_latency' event. Since it's just
> > > another trace event like any other, we can also define a histogram
> > > on that event, the output of which is what we see displayed when
> > > reading the wakeup_latency 'hist' file.
> > >
> > > First, we create a synthetic event called wakeup_latency, that
> > > references 3 variables from other events:
> > >
> > > # echo 'wakeup_latency lat=sched_switch:wakeup_lat \
> > > pid=sched_switch:woken_pid \
> > > prio=sched_switch:woken_prio' >> \
> > > /sys/kernel/debug/tracing/synthetic_events
> > >
> > > Next we add a trigger to sched_wakeup, which saves the value of the
> > > 'common_timestamp' when that event is hit in a variable, ts0. Note
> > > that this happens only when 'comm==cyclictest'.
> > >
> > > Also, 'common_timestamp' is a new field defined on every event (if
> > > needed - if there are no users of timestamps in a trace, timestamps
> > > won't be saved and there's no additional overhead from that).
> > >
> > > # echo 'hist:keys=pid:ts0=common_timestamp.usecs if \
> > > comm=="cyclictest"' >> \
> > > /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
> > >
> > > Next, we add a trigger to sched_switch. When the pid being switched
> > > to matches the pid woken up by a previous sched_wakeup event, this
> > > event grabs the ts0 saved on that event, takes the difference
> > > between it and the current sched_switch's common_timestamp, and
> > > assigns it to a new 'wakeup_lat' variable. It also saves a couple
> > > other variables and then invokes the onmatch().trace() action which
> > > generates a new wakeup_latency event using those variables.
> > >
> > > # echo 'hist:keys=woken_pid=next_pid:woken_prio=next_prio:\
> > > wakeup_lat=common_timestamp.usecs-ts0:onmatch().trace(wakeup_latency) \
> > > if next_comm=="cyclictest"' >> \
> > > /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
> >
> > Hmm, this looks a bit hard to understand, I guess that onmatch() means
> > "if there is an event which has ts0 variable and the event's key matches
> > this key, take some action".
>
> Yes, that's pretty much it. It's essentially shorthand for this kind of
> common idiom, where timestamp[] is an associative array, which in our
> case is the tracing_map of the histogram:
>
> event sched_wakeup()
> {
> ts0[wakeup_pid] = now()
> }
>
> event sched_switch()
> {
> if (ts0[next_pid])
> latency = now() - ts0[next_pid] /* next_pid == wakeup_pid */
> }
>
> Only if ts0 has already been set does the onmatch() get invoked - if ts0
> hasn't been set, there's no match and the trace(wakeup_latency) doesn't
> happen.

OK, it reminds me other questions.

- Even if there is no matched ts0, sched_switch's hist will store
woken_pid etc on its histogram map?
- If there is matched ts0 and wakeup_latency event has been kicked,
the matched entry on ts0 is removed? and also in that case what
happens on sched_switch's hist?

> > I think there are 2 indefinate point that
> > - Where the 'ts0' came from? what the variable will have 'global' scope?
>
> ts0 is basically a per-table-entry variable - there's one for each entry
> in the table, and it can only be accessed by events with matching keys.
> The table owns the variable name, so you can't have two different tables
> with the ts0 variable.

Would you mean 'ts0' is a special name?

> So if we create a histogram on event1 and associate a variable ts0 with
> it, any event hit on that histogram assigns to the corresponding entry's
> ts0 instance.
>
> If we create a histogram on event2 which references ts0, it knows that
> ts0 belongs to event1's histogram, and when there's a hit on event2, the
> same key is used to look up the entry corresponding to that key on
> event1, and if there's a matching entry, it grabs the value of ts0 from
> that and subtracts it from the current event's value to produce the
> latency or whatever it is.
>
> So, that's a long-winded way of saying that the name ts0 is global
> across all tables (histograms) but an instance of ts0 is local to each
> entry in the table that owns the name.

Ah, what I concerned was the scope of name... not instance.

Hmm, in that case, what about other variables in sched_switch?
it seems to have woken_pid,woken_prio and wakeup_lat. Are those also
becomes global instance?

Since I saw below definition, I expected those were not global.
> > > # echo 'wakeup_latency lat=sched_switch:wakeup_lat \
> > > pid=sched_switch:woken_pid \
> > > prio=sched_switch:woken_prio' >> \
> > > /sys/kernel/debug/tracing/synthetic_events

And if so, it is very unsure for users to check what variables are
already defined. I think we'd better to have a 'global'tag for ts0.

> > - What matches to what? onmatch() doesn't tell it.
> >
>
> It's implied by the references to other events - in order for ts0 to be
> resolved, it needs to find the match on event1. Also, the synthetic
> event has references to variables on other events - in order to generate
> the synthetic event, those variables also need to be resolved to
> matching events - note that variables can also come from the current
> event as well.

I don't like such implications, which can make users lost in events easily,
especially for triggers since we don't have the system-wide list of triggers.
IMHO, since this interface is for a kind of programming, it should provide
the abstract but consistent system model too. Implication will mislead
users.

> Hope that clears things up a bit (although the details under the covers
> might seem confusing).

Thank you,

--
Masami Hiramatsu <mhiramat@xxxxxxxxxx>