Re: [PATCH 19/21] tracing: Account for preempt off inpreempt_schedule()

From: Steven Rostedt
Date: Fri Sep 23 2011 - 08:24:42 EST


On Fri, 2011-09-23 at 13:22 +0200, Peter Zijlstra wrote:
> On Fri, 2011-09-23 at 07:19 -0400, Steven Rostedt wrote:
>
> > What would you suggest? Just ignore the latencies that schedule
> > produces, even though its been one of the top causes of latencies?
>
> I would like to actually understand the issue first.. so far all I've
> got is confusion.

Simple. The preemptoff and even preemptirqsoff latency tracers record
every time preemption is disabled and enabled. For preemptoff, this only
includes modification of the preempt count. For preemptirqsoff, it
includes both preempt count increment and interrupts being
disabled/enabled.


Currently, the preempt check is done in add/sub_preempt_count(). But in
preempt_schedule() we call add/sub_preempt_count_notrace() which updates
the preempt_count directly without any of the preempt off/on checks.

The changelog I referenced talked about why we use the notrace versions.
Some function tracing hooks use the preempt_enable/disable_notrace().
Function tracer is not the only user of the function tracing facility.
With the original preempt_diable(), when we have preempt tracing
enabled, the add/sub_preempt_count()s become traced by the function
tracer (which is also a good thing as I've used that info). The issue is
in preempt_schedule() which is called by preempt_enable() if
NEED_RESCHED is set and PREEMPT_ACTIVE is not set. One of the first
things that preempt_schedule() does is call
add_preempt_count(PREEMPT_ACTIVE), to add the PREEMPT_ACTIVE to preempt
count and not come back into preempt_schedule() when interrupted again.

But! If add_preempt_count(PREEPMT_ACTIVE) is traced, we call into the
function tracing mechanism *before* it adds PREEMPT_ACTIVE, and when the
function hook calls preempt_enable_notrace() it will notice the
NEED_RESCHED set and PREEMPT_ACTIVE not set and recurse back into the
preempt_schedule() and boom!

By making preempt_schedule() use notrace we avoid this issue with the
function tracing hooks, but in the mean time, we just lost the check
that preemption was disabled. Since we know that preemption and
interrupts were both enabled before calling into preempt_schedule()
(otherwise it is a bug), we can just tell the latency tracers that
preemption is being disabled manually with the
start/stop_critical_timings(). Note, these function names comes from the
original latency_tracer that was in -rt.

There's another location in the kernel that we need to manually call
into the latency tracer and that's in idle. The cpu_idle() calls
disables preemption then disables interrupts and may call some assembly
instruction that puts the system into idle but wakes up on interrupts.
Then on return, interrupts are enabled and preemption is again enabled.

Since we don't know about this wakeup on interrupts, the latency tracers
would count this idle wait as a latency, which obviously is not what we
want. Which is where the start/stop_critical_timings() was created for.
The preempt_schedule() case is similar in an opposite way. Instead of
not wanting to trace, we want to trace, and the code works for this
location too.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/