Re: [BENCHMARK] Lmbench 2.5.54-mm2 (impressive improvements)

From: Andrew Morton (akpm@digeo.com)
Date: Fri Jan 03 2003 - 16:32:27 EST

Next message: Scott Robert Ladd: "RE: Why is Nvidia given GPL'd code to use in closed source drivers?"
Previous message: Larry McVoy: "Re: Nvidia and its choice to read the GPL "differently""
In reply to: Andi Kleen: "Re: [BENCHMARK] Lmbench 2.5.54-mm2 (impressive improvements)"
Next in thread: Andrew Morton: "Re: [BENCHMARK] Lmbench 2.5.54-mm2 (impressive improvements)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Andi Kleen wrote:
>
> Andrew Morton <akpm@digeo.com> writes:
> >
> > The teeny little microbenchmarks are telling us that the rmap overhead
> > hurts, that the uninlining of copy_*_user may have been a bad idea, that
> > the addition of AIO has cost a little and that the complexity which
> > yielded large improvements in readv(), writev() and SMP throughput were
> > not free. All of this is already known.
>
> If you mean the signal speed regressions they caused - I fixed
> that on x86-64 by inlining 1,2,4,8,10(used by signal fpu frame),16.
> But it should not use the stupud rep ; ..., of the old ersio but direct
> unrolled moves.

Yes, that would help a bit. We should do that for ia32. It's a little
worrisome that the return value from such a copy_*_user() implementation
will be incorrect - it is supposed to return the number of uncopied bytes.
Probably doesn't matter.

Most of the optimisation opportunities wrt signal delivery were soaked up
by replacing the copy_*_user() calls with put_user() and friends.

We could speed up signals heaps by re-lazying the fpu state storage in
some manner.

> x86-64 version in include/asm-x86_64/uaccess.h, could be ported
> to i386 given that movqs need to be replaced by two movls.
>
> -Andi
>
> P.S.: regarding recent lmbench slow downs: I'm a bit
> worried about the two wrmsrs which are in the i386 context switch
> in load_esp0 for sysenter now. Last time I benchmarked WRMSRs on
> Athlon they were really slow and knowing the P4 it is probably
> even slower there. Imho it would be better to undo that patch
> and use Linus' original trampoline stack.

hm. How slow? Any numbers on that?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Scott Robert Ladd: "RE: Why is Nvidia given GPL'd code to use in closed source drivers?"
Previous message: Larry McVoy: "Re: Nvidia and its choice to read the GPL "differently""
In reply to: Andi Kleen: "Re: [BENCHMARK] Lmbench 2.5.54-mm2 (impressive improvements)"
Next in thread: Andrew Morton: "Re: [BENCHMARK] Lmbench 2.5.54-mm2 (impressive improvements)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Tue Jan 07 2003 - 22:00:22 EST