Re: [tip:tracing/core] Revert "x86, bts: reenable ptrace branchtrace support"

From: Ingo Molnar
Date: Thu Jun 11 2009 - 17:41:50 EST



* Metzger, Markus T <markus.t.metzger@xxxxxxxxx> wrote:

> >-----Original Message-----
> >From: Ingo Molnar [mailto:mingo@xxxxxxx]
> >Sent: Thursday, June 11, 2009 12:22 PM
> >To: Peter Zijlstra
> >Cc: Metzger, Markus T; linux-kernel@xxxxxxxxxxxxxxx; mingo@xxxxxxxxxx; hpa@xxxxxxxxx; oleg@xxxxxxxxxx;
> >tglx@xxxxxxxxxxxxx; linux-tip-commits@xxxxxxxxxxxxxxx
> >Subject: Re: [tip:tracing/core] Revert "x86, bts: reenable ptrace branch trace support"
> >
> >
> >* Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> >> On Thu, 2009-06-11 at 07:30 +0100, Metzger, Markus T wrote:
> >> > >-----Original Message-----
> >> > >From: tip-bot for Ingo Molnar [mailto:mingo@xxxxxxx]
> >> > >Sent: Thursday, June 11, 2009 1:37 AM
> >> > >To: linux-tip-commits@xxxxxxxxxxxxxxx
> >> > >Cc: hpa@xxxxxxxxx; mingo@xxxxxxxxxx; peterz@xxxxxxxxxxxxx; Metzger, Markus T; oleg@xxxxxxxxxx;
> >> > >tglx@xxxxxxxxxxxxx; mingo@xxxxxxx
> >> > >Subject: [tip:tracing/core] Revert "x86, bts: reenable ptrace branch trace support"
> >> > >
> >> > >Commit-ID: 511b01bdf64ad8a38414096eab283c7784aebfc4
> >> > >Gitweb: http://git.kernel.org/tip/511b01bdf64ad8a38414096eab283c7784aebfc4
> >> > >Author: Ingo Molnar <mingo@xxxxxxx>
> >> > >AuthorDate: Thu, 11 Jun 2009 00:32:00 +0200
> >> > >Committer: Ingo Molnar <mingo@xxxxxxx>
> >> > >CommitDate: Thu, 11 Jun 2009 00:32:00 +0200
> >> > >
> >> > >Revert "x86, bts: reenable ptrace branch trace support"
> >> > >
> >> > >This reverts commit 7e0bfad24d85de7cf2202a7b0ce51de11a077b21.
> >> > >
> >> > >A late objection to the ABI has arrived:
> >> > >
> >> > > http://lkml.org/lkml/2009/6/10/253
> >> >
> >> > I thought that this has been resolved. See for example http://lkml.org/lkml/2009/6/10/257.
> >> >
> >> > Peters concerns were that Debug Store details are exposed to user space, which is
> >> > not the case. Debug Store itself is fully in-kernel and the expectation of a
> >> > user-defined buffer can be implemented on top of the Debug Store changes that
> >> > Peter expects are needed to support PEBS.
> >> >
> >> > A user-defined trace buffer size is required to support
> >> > different usage models. Some users only need a small amount of
> >> > trace, whereas others need a big amount. The interface will have
> >> > to reflect that in some way.
> >>
> >> Right, your last email did explain how we could keep per task
> >> in-kernel buffers and fill them from the DS and still have them of
> >> user-specified size.
> >>
> >> That would indeed keep the proposed ABI workable, what I'm still
> >> not liking is that this buffer is in-kernel, but I guess that
> >> might be something for other people to have an opinion on.
> >
> > Hm. Wrt. the ABI, wouldnt it make more sense to expose this PMU
> > feature via perfcounters: a sampling hw-branch-executions
> > counter, with interval=1.
> >
> > That would give the exact existing semantics, plus a lot lot
> > more. Markus?
>
> What more would we get?

There's numerous direct functionality advantages:

- We will get all the sampling features of perfcounters such as
timed samples, CPU ID samples. Some will be approximate (timing),
some precise (CPU ID).

- We will get the advanced workflow isolation features: we could
sample on a per CPU basis (system-wide BTS), and we could sample
child tasks automatically. The current code is limited to a
single task.

- We will sample other types of information into the same outgoing
event buffer: for example branch-miss events, intermixed with BTS
records. This could help not just the narrow purpose of
debugging, but also the purpose of performance analysis.

- There's a rich and fast/efficient VFS based APIs to wait for
event overflows: poll(), read(), mmap().

- Remote sampling via perfcounters is transparent, while ptrace
sampling can be seen by apps.

There's maintenance advantages as well from the x86 architecture and
scheduler maintenance point of view:

- We would have a single facility handling the Debug Store, and
we'd have almost all pieces in place for PEBS support in
perfcounters as well so there's good synergy.

There are performance advantages as well:

- There's lazy-switching optimizations in perfcounters avoiding the
DS buffer switching overhead.

> I take it that you don't want to implement branch tracing via
> PEBS, which would be possible but rather inefficient since the BTS
> format is much more compact than the PEBS format.

Sampling could be done via PEBS too, if someone wants to take
advantage of the instruction latency field for example on Nehalem.

But yes, i agree that for the simple case of
branch-executions+period=1 case we want that to use BTS, as those
records are a lot more compact than the all-general-purpose-regs
bloated records of PEBS.

> So we would still implement it via BTS and we would still like to
> present a branch trace specific format to the user.
>
> Are you suggesting to use a common ABI for sampling and branch
> tracing?

Yes, that makes sense.

> The existing ABI is tailored towards the expected users:
> debuggers. I do believe that a ptrace based interface makes a lot
> of sense for this debugging-related feature, since debuggers
> already speak ptrace.
>
> Branch tracing and sampling are used by different classes of
> user-mode applications. I don't think that a common ABI would
> benefit user-mode. Since we do need different implementations in
> the kernel, I don't see how a common ABI would help here, either.
>
> I rather see this as two independent, unrelated hardware features
> that happen to use the same technique to allow arbitrary-sized
> buffers and that therefore share some hardware real-estate.

I'd rather not maintain two separate pieces of infrastructure and
ABIs.

We had a _lot_ of problems with the BTS code, and there's still that
unresolved crash from akpm. (i too reported crashes in the past)

That is the problem with such rarely used ABIs: almost nobody tests
them.

With perfcounters that dynamics changes quite profoundly: the DS and
the overflow handling will be used for PEBS anyway, so there's good
overlap and good sharing in facilities.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/