Re: [PATCH] 2.3.47: Optimize entry.S and hw_irq.h

From: Rainer Keller (Rainer.Keller@studmail.uni-stuttgart.de)
Date: Fri Feb 25 2000 - 06:40:13 EST


HI Ingo,

Ingo Molnar wrote:
> On Thu, 24 Feb 2000, Rainer Keller wrote:
>
> > Kernel 2.3.47 real: 21m32.763s 491815 Bytes
> > user: 19m35.340s
> > sys: 1m13.340s
> >
> > Kernel 2.3.47-OPT real: 20m15.021s 491921 Bytes
> > user: 18m20.580s
> > sys: 1m 4.930s
> these numbers are _highly_ suspect. entry.S micro-optimizations _never_
> show up in a measurable way in any macro-benchmark. Even switching to
> fast system calls (which shaves off 130 cycles) only showed up in a few
> microbenchmarks.
Hmmm, OK.
Although I have to say, I measured three times -- Under 2.3.47-plain I
had a difference of 1.2 seconds!

Still, the macro SAVE_ALL in include/asm-i386/hw_irq.h is part of it, as
well, which is used for every Interrupt, as far as I can see...
(including the timer-interrupt)

> A full kernel compilation like the above uses less than 100k system
> calls. Even optimistically assuming 100 thousand system calls, the
> above numbers mean that your change saved 11185 cycles per system call.
> (on a 133 MHz Pentium) Not to mention that the above code also shows
> better user-space numbers, showing another 25000 cycles improvement on
> the user-space side...
Again, how many interrupts do occur during a full compile ?

I counted 107 system calls, when compiling an empty C-file (main(){},
nothing else):

strace gcc -Wall -O2 -pipe -o gcc_sys_calls gcc_sys_calls.c 2>&1 | less

Now, depending on how many files to include, it should be considerably
more.
Of the 5480 .c, .h and .S -files currently in 2.3.47 (make distclean) --
after compiling (make oldconfig dep bzlilo), I have 269 object-files.
So the lowest number of possible system-calls are ~28783 system-calls
(of course assuming, there is no include<>, no linking).

So after all, just counting system-calls (w/o interrupts), I'd say,
there are a lot more than 100k system-calls.

> nevertheless it's possible to measure accurately how much improvement
> your code brings, i've attached pid.c which is a very simple
> null-syscall latency tester, i used it for doing similar experiments
> :-) Run it several times to hit the correct cache alignment scenario
> which produces the best latency number. Here i get:
Thanks!
According to Your program, under 2.3.47-plain I get 159 cyles and
2.3.47-OPT I get 151 cycles.

Is that plausible ?!

Greetings raY

-- 
------------------------------------------------------------------------
Rainer Keller            e-mail: Rainer.Keller@studmail.uni-stuttgart.de
Universitaetsstr. 100    WWW: http://wg100.wh.uni-stuttgart.de/~ray
70569 Stuttgart          Tel. 0711 / 6787536

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Feb 29 2000 - 21:00:12 EST