Re: Another Performance Regression in write() syscall

From: Ingo Molnar
Date: Tue Feb 24 2009 - 12:51:30 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Tue, 24 Feb 2009, Dave Hansen wrote:
> >
> > Yeah, that's a good point. Are we sure that's what is
> > happening here, though? That's one thing a profile would
> > hopefully help with.
>
> One thing to note is that _if_ it's purely an issue of
> nontemporal stores vs normal stores, then profiling is very
> likely going to be almost entirely useless. You'll get
> "results", but the results have nothing what-so-ever to do
> with reality or anything interesting.
>
> The nontemporal stores may stand out in the profiles, but the
> actual performance impact will be all about whether totally
> unrelated code got cache misses or not. Quite often those
> cache misses will also be in user mode, and very possibly in
> other processes.
>
> So profiles can certainly be interesting, but if Salman says
> that his patch makes a difference for his benchmark, then
> profiling is almost certainly not interesting FOR THAT PATCH.
> It's interesting mainly as a way to look at whether there are
> then also _other_ issues that are worth addressing (ie the
> whole atime thing is in a whole different dimension and an
> independent issue).

a 'perfstat' run would certainly be interesting (for cases where
a pure /usr/bin/time run is inconclusive), comparing the
unpatched and patched kernel.

That way we can see summary counts for the whole workload, like:

-----------------------------------------------
| Performance counter stats for './mmap-perf' |
-----------------------------------------------
| |
| x86-defconfig | PARAVIRT=y
|------------------------------------------------------------------
|
| 1311.554526 | 1360.624932 task clock ticks (msecs) +3.74%
| |
| 1 | 1 CPU migrations
| 91 | 79 context switches
| 55945 | 55943 pagefaults
| ............................................
| 3781392474 | 3918777174 CPU cycles +3.63%
| 1957153827 | 2161280486 instructions +10.43%
| 50234816 | 51303520 cache references +2.12%
| 5428258 | 5583728 cache misses +2.86%
|
| 437983499 | 478967061 branches +9.36%
| 32486067 | 32336874 branch-misses -0.46%
| |
| 1314.782469 | 1363.694447 time elapsed (msecs) +3.72%
| |
-----------------------------------

Such a comparison of would certainly be more meaningful for such
things than a profile.

Salman, if you are interested in doing a perfstat comparison,
just pick up a tip:master kernel [perfcounters are
default-enabled in it]:

http://people.redhat.com/mingo/tip.git/README

and run perfstat on it (as root, to get the kernel-mode counts
too):

http://redhat.com/~mingo/perfcounters/perfstat.c

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/