Re: [PATCH 2/2] Markers Implementation for Preempt RCU BoostTracing

From: Ingo Molnar
Date: Wed Jan 02 2008 - 07:49:09 EST



* Frank Ch. Eigler <fche@xxxxxxxxxx> wrote:

> Ingo Molnar <mingo@xxxxxxx> writes:
>
> > [...] Firstly, why on earth does a full format string have to be
> > passed in for something as simple as a CPU id? This way we basically
> > codify it forever that tracing _has_ to be expensive when
> > enabled. [...]
>
> FWIW, I'm not keen about the format strings either, but they don't
> constitute a performance hit beyond an additional parameter. It does
> not need to actually get parsed at run time.

"only" an additional parameter. The whole _point_ behind these markers
is for them to have minimal effect!

> >[...]
> > Secondly, the inlined overhead of trace_mark() is still WAY too large:
> >
> > if (unlikely(__mark_##name.state)) { \
> > [...]
> > } \
>
> Note that this is for the unoptimized case. The immediate-value code
> is better. I have still yet to see some good measurements of how much
> the overheads of the various variants are, however. It's only fair to
> gather these numbers and continue the debate with them in hand.

this is a general policy matter. It is _so much easier_ to add markers
if they _can_ have near-zero overhead (as in 1-2 instructions).
Otherwise we'll keep arguing about it, especially if any is added to
performance-critical codepath. (where we are counting instructions)

> > Whatever became of the obvious suggestion that i outlined years ago,
> > to have a _single_ trace call instruction and to _patch out_ the
> > damn marker calls by default? [...] only leaving a ~5-byte NOP
> > sequence behind them (and some minimal disturbance to the variables
> > the tracepoint accesses). [...]
>
> This has been answered several times before. It's because the marker
> parameters have to be (conditionally) evaluated and pushed onto a call
> frame. It's not just a call that would need being nop'd, but a whole
> function call setup/teardown sequence, which itself can be interleaved
> with adjacent code.

you are still missing my point. Firstly, the kernel is regparm built so
there's no call frame to be pushed to in most cases - we pass most
parameters in registers. (hence my stressing of minimizing the number of
parameters to an absolute minimum)

Secondly, the side-effects of a function call is what i referred to via:

> [...] (and some minimal disturbance to the variables the tracepoint
> accesses). [...]

really, a tracepoint obviously accesses data that is readily accessible
in that spot. Worst-case we'll have some additional register constraints
that make the code a bit less optimal.

Yes, if a trace point references data that is _not_ readily available,
then of course the preparation for the function call might not be cheap.
But that can be optimized by placing the tracepoints intelligently.

what cannot be optimized away at all are the conditional instructions
introduced by the probe points, extra parameters and the space overhead
of the function call itself.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/