Re: IV.5 - Intel Last Branch Record (LBR)

From: stephane eranian
Date: Mon Jun 22 2009 - 16:02:51 EST


On Mon, Jun 22, 2009 at 2:01 PM, Ingo Molnar<mingo@xxxxxxx> wrote:
>> 5/ Intel Last Branch Record (LBR)
>>
>> Intel processors since Netburst have a cyclic buffer hosted in
>> registers which can record taken branches. Each taken branch is
>> stored into a pair of LBR registers (source, destination). Up
>> until Nehalem, there was not filtering capabilities for LBR. LBR
>> is not an architected PMU feature.
>>
>> There is no counter associated with LBR. Nehalem has a LBR_SELECT
>> MSR. However there are some constraints on it given it is shared
>> by threads.
>>
>> LBR is only useful when sampling and therefore must be combined
>> with a counter. LBR must also be configured to freeze on PMU
>> interrupt.
>>
>> How is LBR going to be supported?
>
> If there's interest then one sane way to support it would be to
> expose it as a new sampling format (PERF_SAMPLE_*).
>
LBR is very important, it becomes useable with Nehalem where you
can filter on priv level. It is important for statistical basic block
profiling, for instance. Another important feature is its ability to freeze
on PMU interrupt.

LBR is also interesting because it yield a path to an event.

LBR on NHM (and others) is not that easy to handle because:
- need to read-modify-write IA32_DEBUGCTL
- LBR_TOS, the position pointer is read-only
- LBR_SELECT to configure LBR is shared at the core-level on NHM

but it is very much worthwhile.

> Regarding the constraints - if we choose to expose the branch-type
> filtering capabilities of Nehalem, then that puts a constraint on
> counter scheduling: two counters with conflicting constraints should
> not be scheduled at once, but should be time-shared via the usual
> mechanism.
>
You need to expose the branch filtering in some way. The return
branch filter is useful for statistical call graph sampling, for instance.
If you can't disable the other types of branches, then the relevance
of the data drops.

> The typical use-case would be to have no or compatible LBR filter
> attributes between counters though - so having the conflicts is not
> an issue as long as it works according to the usual model.
>
Conflict arises when two events request different filter value. The conflict
happens in both per-thread and per-cpu mode when HT is on. In per-cpu
mode it can be controlled at the core level. But in per-thread mode, it
has to be managed globally as a thread may migrate. Multiplexing
as it is implemented manages groups attached to one thread or CPU.
Here it would have to look across a pair of CPUs (or threads) and
ensure that no two groups using LBR with conflicting filter selections
can be scheduled at the same time on the two hyper-threads of the
same core. I think it may be easier, for now, to say LBR_SELECT
is global, first come, first serve.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/