Re: [PATCH 0/3] tracepoints: delay argument evaluation

From: Ingo Molnar
Date: Wed May 20 2009 - 03:34:35 EST



* Jason Baron <jbaron@xxxxxxxxxx> wrote:

> hi,
>
> After disassembling some of the tracepoints, I've noticed that
> arguments that are passed as macros or that perform dereferences,
> evaluate prior to the tracepoint on/off check. This means that we
> are needlessly impacting the off case.
>
> I am proposing to fix this by adding a macro that first checks for
> on/off and then calls 'trace_##name', preserving type checking.
> Thus, callsites have to move from:
>
> trace_block_bio_complete(md->queue, bio);
>
> to:
>
> tracepoint_call(block_bio_complete, md->queue, bio);
>
> I've tried '__always_inline', but that did not fix this issue.
> Obviously this change will require changes to all the callsites.
> But, that shouldn't be very hard, I've already included the
> scheduler and block changes with this patch. I think its important
> to minimize code execution in the off case, and thus going through
> all the callsites is well worth it. If we agree on this change, I
> can change the rest in very short order.
>
> Below I'm also showing the assembly in the 'dec_pending()'
> function before and after this change to show the difference it
> makes. The arguments to the tracepoint are as above, 'md->queue'
> and 'bio'. Notice the 2 extra instructions, before the initial
> 'je', that could be moved after the 'je'.

>
> before:
>
> ffffffff8137b2a3: 83 3d de 90 4b 00 00 cmpl $0x0,0x4b90de(%rip) # ffffffff81834388 <__tracepoint_block_bio_complete+0x8>
> ffffffff8137b2aa: 49 8b 45 50 mov 0x50(%r13),%rax
> ffffffff8137b2ae: 48 89 45 d0 mov %rax,-0x30(%rbp)
> ffffffff8137b2b2: 74 1f je ffffffff8137b2d3 <dec_pending+0x101>

> after:
>
> ffffffff8137b2a3: 83 3d de 90 4b 00 00 cmpl $0x0,0x4b90de(%rip) # ffffffff81834388 <__tracepoint_block_bio_complete+0x8>
> ffffffff8137b2aa: 74 27 je ffffffff8137b2d3 <dec_pending+0x101>

hm, this is really a compiler bug in essence - the compiler should
delay the construction of arguments into unlikely branches - if the
arguments are only used there.

We'd basically open-code a clear-cut:

trace_block_bio_complete(md->queue, bio);

into this form:

trace(block_bio_complete, md->queue, bio);

.. and this latter form could become moot (and a nuisance) if the
compiler is fixed.

Have you tried very latest GCC, does it still have this optimization
problem?

Note that the compiler getting this right would help a _lot_ of
other inline functions in the kernel as well. Arguments only used
within unlikely() branches are quite common.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/