Re: [PATCH 09/12] perf_events: add hook to flush branch_stack oncontext switch (v2)

From: Stephane Eranian
Date: Thu Dec 08 2011 - 13:04:40 EST


On Thu, Dec 8, 2011 at 2:49 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Wed, 2011-12-07 at 10:25 -0800, Stephane Eranian wrote:
>> On Mon, Dec 5, 2011 at 1:37 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> > On Fri, 2011-10-14 at 14:37 +0200, Stephane Eranian wrote:
>> >> + Â Â Â Â Â Â Â /*
>> >> + Â Â Â Â Â Â Â Â* check if the context has at least one
>> >> + Â Â Â Â Â Â Â Â* event using PERF_SAMPLE_BRANCH_STACK
>> >> + Â Â Â Â Â Â Â Â*/
>> >> + Â Â Â Â Â Â Â if (cpuctx->ctx.nr_branch_stack > 0
>> >> + Â Â Â Â Â Â Â Â Â && pmu->flush_branch_stack) {
>> >> +
>> >> + Â Â Â Â Â Â Â Â Â Â Â pmu = cpuctx->ctx.pmu;
>> >> +
>> >> + Â Â Â Â Â Â Â Â Â Â Â perf_ctx_lock(cpuctx, cpuctx->task_ctx);
>> >> +
>> >> + Â Â Â Â Â Â Â Â Â Â Â perf_pmu_disable(pmu);
>> >> +
>> >> + Â Â Â Â Â Â Â Â Â Â Â pmu->flush_branch_stack();
>> >> +
>> >> + Â Â Â Â Â Â Â Â Â Â Â perf_pmu_enable(pmu);
>> >> +
>> >> + Â Â Â Â Â Â Â Â Â Â Â perf_ctx_unlock(cpuctx, cpuctx->task_ctx);
>> >> + Â Â Â Â Â Â Â }
>> >> + Â Â Â }
>> >
>> > (what whitespace looks funny)
>> >
>> > So all PMUs not supporting this branch stuff will fail to create a
>> > has_branch_stack() event, right? Thus all ctx with !0 nr_branch_stack
>> > support it. Doesn't this make the test for pmu->flush_branch_stack
>> > redundant?
>> >
>> >
>> No, nr_branch_stack counts the number of active events with
>> branch_stack. It's like the ctx->nr_cgroups. Processors which
>> do not support branch_stack will always have this field to 0.
>> It's not because a processor supports branch_stack that we
>> need to call flush_branch_stack(), i.e., we use a lazy approach.
>
> What you're saying is we can support branch stack and not need
> flush_branch_stack()? Say in the case the x86 LBR TOS field would be a
> full u64 counter, since then we could sample the TOS on context switch
> and filter on that, obviating the hard reset we do now.
>
The whole motivation behind the flush_branch_stack is explained in the
Changelog of the patch. In summary, we need to flush the LBR (regardless
of TOS) because in system-wide we need to be able to associate the content
of the LBR with a specific task. Given that the HW does not capture the PID
in the LBR buffer, the kernel has to intervene. Why don't we have this already?
Because we are capturing at all priv levels. But with this patchset, it becomes
possible to filter taken branches based on priv levels. Thus, if you only sample
at the user level and run in system-wide mode, it is more likely you could end
up with branches belonging to two different tasks in the LBR buffer. But you'd
have no way of determining this just by looking at the content of the buffer.
So instead, we need to flush the LBR on context switch to associate a PID
with them.

Because this is an expensive operation, we want to do this only when we
sample on LBR. That's what the ctx->nr_branch_stack is about. We could
refine that some more by checking for system-wide events with only
user priv level on the branch stack. But I did not do that yet.

Does this make more sense now?

> And the advantage of testing for the operation as opposed to putting in
> a dummy function (like we do for most other optional methods) is
> avoiding all that ctx_lock and pmu_disable muck.
>
> Fair enough.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/