Re: [PATCH v2] [LBR] Dump LBRs on Exception

From: Andy Lutomirski
Date: Sun Dec 07 2014 - 14:10:31 EST


On Sun, Dec 7, 2014 at 10:40 AM, Robert Jarzmik
<robert.jarzmik@xxxxxxxxx> wrote:
> Hi Andy,
>
> Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes:
>> On Dec 6, 2014 2:31 AM, "Robert Jarzmik" <robert.jarzmik@xxxxxxxxx> wrote:
>>> We would have a "LBR resource" variable to track who owns the LBR :
>>> - nobody : LBR_UNCLAIMED
>>> - the exception handler : LBR_EXCEPTION_DEBUG_USAGE
>>
>> Which exception handler? There can be several on the stack.
> All of them, ie. LBR is used by exception handlers, ie. perf cannot use it, just
> as what Emmanuel's patch is doing I think. Or said differently LBR are reserved
> for expeption handlers only, whichever have the implementation to use them.
>
>>> - case 3d: kernel exception with a reschedule inside
>>> -> exception entry
>>> -> test lbr_dump_state == EXCEPTION_OWNED => true => STOP LBR
>>> -> exception handling
>>> -> context_switch()
>>> -> perf cannot touch LBR, nobody can
>>> -> test lbr_dump_state == EXCEPTION_OWNED => true => START LBR
>>
>> Careful. This is still the nested exception, and it just did the wrong thing.
> Can you be more explicit about the "wrong" thing ? And would that wrong thing be
> solved by a per-cpu reference counter ?

Suppose you have an int3 with a page fault inside. If the int3
disabled LBR, then the int3 should re-enable it, and the page fault
should not. This means that, if the inner page fault is, in fact, an
OOPS, then you don't get the LBR trace.

A per-cpu reference counter would solve it. So would using rdmsr
instead of wrmsr, because there would be nothing to re-enable. (The
latter also means that both exceptions get the LBR trace if they turn
out to be OOPSes.)

But a per-cpu reference counter still has the per-cpu issue below.

>
>>> I might be very wrong in the description as I'm not that sharp on x86, but is
>>> there a flaw in the above cases ?
>>>
>>> If not, a couple of tests and Thomas's per-cpu variable can solve the issue,
>>> while keeping the exception handler code simple as Emmanual has proposed
>> (given
>>> the additionnal test inclusion - which will be designed to not pollute the
>> LBR),
>>> and having a small impact on perf to solve the resource acquire issue.
>>
>> On current kernels, percpu memory is vmalloced, so accessing it can fault, so
>> you can't touch percpu memory at all from page_fault until the vmalloc fixup
>> runs. Sorry :(
> What about INIT_PER_CPU_VAR (as in gdt_page) ? Won't that be mapped all the time
> without need for faulting in pages ?

I'm not sure. It may not if CPUs are hotplugged.

>
>> This is a problem with rdmsr, too.
> You mean rdmsr can fault in a non-hypervisor environment ? Because that
> definetely opens a new range of corner cases.
>
>> It may be worth fixing that. In fact, it may be worth getting rid of lazy vmap
>> entirely.
> Your battle ? ;)
>
> Anyway, would a static per-cpu variable (or variables, one about resources
> usage, one reference counter) solve our cases (ie. 3d) ?
>

Possibly, but only if static per-cpu reference counters are safe to
touch in the exception entry code.

Tejun?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/