Re: [patch 09/17] LTTng instrumentation - filemap
From: Nick Piggin
Date: Thu Jul 17 2008 - 03:12:26 EST
On Thursday 17 July 2008 17:02, Mathieu Desnoyers wrote:
> * Nick Piggin (nickpiggin@xxxxxxxxxxxx) wrote:
> > On Wednesday 16 July 2008 08:26, Mathieu Desnoyers wrote:
> > > Instrumentation of waits caused by memory accesses on mmap regions.
> > >
> > > Those tracepoints are used by LTTng.
> > >
> > > About the performance impact of tracepoints (which is comparable to
> > > markers), even without immediate values optimizations, tests done by
> > > Hideo Aoki on ia64 show no regression. His test case was using
> > > hackbench on a kernel where scheduler instrumentation (about 5 events
> > > in code scheduler code) was added. See the "Tracepoints" patch header
> > > for performance result detail.
> >
> > BTW. this sort of test is practically useless to measure overhead. If
> > a modern CPU is executing out of primed insn/data and branch prediction
> > cache, then yes this sort of thing is pretty well free.
> >
> > I see *real* workloads that have got continually and incrementally slower
> > eg from 2.6.5 to 2.6.20+ as "features" get added. Surprisingly, none of
> > them ever showed up individually on a microbenchmark.
> >
> > OK, for this case if you can configure it out, I guess that's fine. But
> > let's not pretend that adding code and branches and function calls are
> > ever free.
>
> I never pretended anything like that. Actually, that's what the
OK but saying "there is no detectable impact when running hackbench" is
basically meaningless.
> "immediate values" are for : they allow to patch load immediate value
> instead of a memory read to decrease d-cache impact. They now allow to
> patch a jump instead of the memory read/immediate value read + test +
> conditional branch to skip the function call with fairly minimal impact.
> I agree with you that eating precious d-cache and jump prediction buffer
> entries can eventually slow down the system. But this will be _hard_ to
> show on a single macro benchmark, and the microbenchmark showing it will
> have to be taken in conditions which will exacerbate the d-cache and BPB
> impact.
I'm not saying you have to reproduce it (although Intel's Oracle OLTP
benchmark is very sensitive to that kind of thing and schedule() is near
the top). But just acknowledge that it adds some cost. OK you're one of
the few people really trying hard to count every cycle so I don't mean to
pick on this code in particular.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/