Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change

From: Steven Rostedt
Date: Wed May 29 2019 - 08:43:03 EST


On Wed, 29 May 2019 12:20:38 +0200
Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

> On Wed, May 29, 2019 at 11:40:34AM +0200, Daniel Bristot de Oliveira wrote:
> > On 29/05/2019 10:33, Peter Zijlstra wrote:
> > > On Tue, May 28, 2019 at 05:16:23PM +0200, Daniel Bristot de Oliveira wrote:
> > >> The preempt_disable/enable tracepoint only traces in the disable <-> enable
> > >> case, which is correct. But think about this case:
> > >>
> > >> ---------------------------- %< ------------------------------
> > >> THREAD IRQ
> > >> | |
> > >> preempt_disable() {
> > >> __preempt_count_add(1)
> > >> -------> smp_apic_timer_interrupt() {
> > >> preempt_disable()
> > >> do not trace (preempt count >= 1)
> > >> ....
> > >> preempt_enable()
> > >> do not trace (preempt count >= 1)
> > >> }
> > >> trace_preempt_disable();
> > >> }
> > >> ---------------------------- >% ------------------------------
> > >>
> > >> The tracepoint will be skipped.
> > >
> > > .... for the IRQ. But IRQs are not preemptible anyway, so what the
> > > problem?
> >
> >
> > right, they are.
> >
> > exposing my problem in a more specific way:
> >
> > To show in a model that an event always takes place with preemption disabled,
> > but not necessarily with IRQs disabled, it is worth having the preemption
> > disable events separated from IRQ disable ones.
> >
> > The main reason is that, although IRQs disabled postpone the execution of the
> > scheduler, it is more pessimistic, as it also delays IRQs. So the more precise
> > the model is, the less pessimistic the analysis will be.
>
> I'm not sure I follow, IRQs disabled fully implies !preemptible. I don't
> see how the model would be more pessimistic than reality if it were to
> use this knowledge.
>
> Any !0 preempt_count(), which very much includes (Hard)IRQ and SoftIRQ
> counts, means non-preemptible.

I believe I see what Daniel is talking about, but I hate the proposed
solution ;-)

First, if you care about real times that the CPU can't preempt
(preempt_count != 0 or interrupts disabled), then you want the
preempt_irqsoff tracer. The preempt_tracer is more academic where it
just shows you when we disable preemption via the counter. But even
with the preempt_irqsoff tracer you may not get the full length of time
due to the above explained race.

__preempt_count_add(1) <-- CPU now prevents preemption

<interrupt>
---->
trace_hardirqs_off() <-- Start preempt disable timer
handler();
trace_hardirqs_on() <-- Stop preempt disable timer (Diff A)
<----
trace_preempt_disable() <-- Start preempt disable timer

[..]

trace_preempt_enable() <-- Stop preempt disable timer (Diff B)

__preempt_count_sub(1) <-- CPU re-enables preemption

The preempt_irqsoff tracer will break this into two parts: Diff A and
Diff B, when in reality, the total time the CPU prevented preemption
was A + B.

Note, we can have the same race if an interrupt came in just after
calling trace_preempt_enable() and before the __preempt_count_sub().

What I would recommend is adding a flag to the task_struct that gets
set before the __preempt_count_add() and cleared by the tracing
function. If an interrupt goes off during this time, it will start the
total time to record, and not end it on the trace_hardirqs_on() part.
Now since we set this flag before disabling preemption, what if we get
preempted before calling __preempt_count_add()?. Simple, have a hook in
the scheduler (just connect to the sched_switch tracepoint) that checks
that flag, and if it is set, it ends the preempt disable recording
time. Also on scheduling that task back in, if that flag is set, start
the preempt disable timer.

It may get a bit more complex, but we can contain that complexity
within the tracing code, and we don't have to do added irq disabling.

-- Steve