Re: [PATCH 2/4] nmi_backtrace: generate one-line reports for idle cpus

From: Peter Zijlstra
Date: Mon Mar 07 2016 - 15:43:37 EST


On Mon, Mar 07, 2016 at 12:38:16PM -0500, Chris Metcalf wrote:
> On 03/07/2016 04:48 AM, Peter Zijlstra wrote:
> I'm a little skeptical that a single percpu write is going to add much
> measurable overhead to this path.

So that write is almost guaranteed to be a cacheline miss, those things
hurt and do show up on profiles.

> However, we can certainly adapt
> alternate approaches that stay away from the actual idle code.
>
> One approach (diff appended) is to just test to see if the PC is
> actually in the architecture-specific halt code. There are two downsides:
>
> 1. It requires a small amount of per-architecture support. I've provided
> the tile support as an example, since that's what I tested. I expect
> x86 is a little more complicated since there are more idle paths and
> they don't currently run the idle instruction(s) at a fixed address, but
> it's unlikely to be too complicated on any platform.
> Still, adding anything per-architecture is certainly a downside.
>
> 2. As proposed, my new alternate solution only handles the non-polling
> case, so if you are in the polling loop, we won't benefit from having
> the NMI backtrace code skip over you. However my guess is that 99% of
> the time folks do choose to run the default non-polling mode, so this
> probably still achieves a pretty reasonable outcome.
>
> A different approach that would handle downside #2 and probably make it
> easier to implement the architecture-specific code for more complicated
> platforms like x86 would be to use the SCHED_TEXT model and tag all the
> low-level idling functions as CPUIDLE_TEXT. Then the "are we idling"
> test is just a range compare on the PC against __cpuidle_text_{start,end}.
>
> We'd have to decide whether to make cpu_idle_poll() non-inline and just
> test for being in that function, or whether we could tag all of
> cpu_idle_loop() as being CPUIDLE_TEXT and just omit any backtrace
> whenever the PC is anywhere in that function. Obviously if we have
> called out to more complicated code (e.g. Daniel's concern about calling
> out to power management code) the PC would no longer be in the CPUIDLE_TEXT
> at that point, so that might be OK too.

But the CPU would also not be idle if its running pm code.

So I like the CPUIDLE_TEXT approach, since it has no impact on the
generated code.

An alternative option could be to inspect the stack, we already take a
stack dump, so you could say that everything that has cpuidle_enter() in
its callchain is an 'idle' cpu.

Yet another option would be to look at rq->idle_state or any other state
cpuidle already tracks. The 'obvious' downside is relying on cpuidle,
which I understand isn't supported by everyone.