Re: Unified tracing buffer

From: Linus Torvalds
Date: Tue Sep 23 2008 - 00:06:56 EST




On Mon, 22 Sep 2008, Mathieu Desnoyers wrote:
>
> Unless I am missing something, in the case we use an atomic operation
> which implies memory barriers (cmpxchg and atomic_add_return does), one
> can be sure that all memory operations done before the barrier are
> completed at the barrier and that all memory ops following the barrier
> will happen after.

Sure (if you have a barrier - not all architectures will imply that for an
incrment).

But that still doesn't mean a thing.

You have two events (a) and (b), and you put trace-points on each. In your
trace, you see (a) before (b) by comparing the numbers. But what does that
mean?

The actual event that you traced is not the trace-point - the trace-point
is more like a fancy "printk". And the fact that one showed up before
another in the trace buffer, doesn't mean that the events _around_ the
trace happened in the same order.

You can use the barriers to make a partial ordering, and if you have a
separate tracepoint for entry into a region and exit, you can perhaps show
that they were totally disjoint. Or maybe they were partially overlapping,
and you'll never know exactly how they overlapped.

Example:

trace(..);
do_X();

being executed on two different CPU's. In the trace, CPU#1 was before
CPU#2. Does that mean that "do_X()" happened first on CPU#1?

No.

The only way to show that would be to put a lock around the whole trace
_and_ operation X, ie

spin_lock(..);
trace(..);
do_X();
spin_unlock(..);

and now, if CPU#1 shows up in the trace first, then you know that do_X()
really did happen first on CPU#1. Otherwise you basically know *nothing*,
and the ordering of the trace events was totally and utterly meaningless.

See? Trace events themselves may be ordered, but the point of the trace
event is never to know the ordering of the trace itself - it's to know the
ordering of the code we're interested in tracing. The ordering of the
trace events themselves is irrelevant and not useful.

And I'd rather see people _understand_ that, than if they think the
ordering is somehow something they can trust.

Btw, if you _do_ have locking, then you can also know that the "do_X()"
operations will be essentially as far apart in some theoretical notion of
"time" (let's imagine that we do have global time, even if we don't) as
the cost of the trace operation and do_X() itself.

So if we _do_ have locking (and thus a valid ordering that actually can
matter), then the TSC doesn't even have to be synchronized on a cycle
basis across CPU's - it just needs to be close enough that you can tell
which one happened first (and with ordering, that's a valid thing to do).

So you don't even need "perfect" synchronization, you just need something
reasonably close, and you'll be able to see ordering from TSC counts
without having that horrible bouncing cross-CPU thing that will impact
performance a lot.

Quite frankly, I suspect that anybody who wants to have a global counter
might as well almost just have a global ring-buffer. The trace events
aren't going to be CPU-local anyway if you need to always update a shared
cacheline - and you might as well make the shared cacheline be the ring
buffer head with a spinlock in it.

That may not be _quite_ true, but it's probably close enough.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/