Re: [PATCH] i386-pda UP optimization

From: Jeremy Fitzhardinge
Date: Thu Nov 16 2006 - 19:12:54 EST


Ingo Molnar wrote:
> what point would there be in using it? It's not like the kernel could
> make use of the thread keyword anytime soon (it would need /all/
> architectures to support it) ...

The plan was to implement the x86 arch-specific percpu stuff to use it,
since it allows gcc better optimisation opportunities.

> and the kernel doesnt mind how the
> current per_cpu() primitives are implemented, via assembly or via C. In
> any case, it very much matters to see the precise cost of having the pda
> selector value in %gs versus %fs.
>

Hm, well, unfortunately for me, there is a small but distinct advantage
to using %fs rather than %gs (around 0-5ns per iteration). The notable
exception being the "AMD-K6(tm) 3D+ Processor", where %gs is about 25%
(15ns) faster.

I'll revise the patches to use %fs and resubmit.

J
"Genuine Intel(R) CPU T2400 @ 1.83GHz" @1000Mhz (6,14,8):
ds=7b fs=0 gs=33 ldt=f gdt=3b CPUTIME
<none> with data selector: 0ns/iteration
fs with data selector: 26ns/iteration
gs with data selector: 30ns/iteration

<none> with LDT selector: 0ns/iteration
fs with LDT selector: 26ns/iteration
gs with LDT selector: 26ns/iteration

<none> with GDT selector: 0ns/iteration
fs with GDT selector: 26ns/iteration
gs with GDT selector: 30ns/iteration

"Intel(R) Pentium(R) 4 CPU 1.80GHz" @1817.9Mhz (15,2,4):
ds=7b fs=0 gs=33 ldt=f gdt=3b CPUTIME
<none> with data selector: 0ns/iteration
fs with data selector: 33ns/iteration
gs with data selector: 34ns/iteration

<none> with LDT selector: 0ns/iteration
fs with LDT selector: 43ns/iteration
gs with LDT selector: 52ns/iteration

<none> with GDT selector: 0ns/iteration
fs with GDT selector: 33ns/iteration
gs with GDT selector: 34ns/iteration

"Intel(R) Celeron(R) CPU 2.40GHz" @2394.47Mhz (15,2,9):
ds=7b fs=0 gs=33 ldt=f gdt=3b CPUTIME
<none> with data selector: 0ns/iteration
fs with data selector: 20ns/iteration
gs with data selector: 24ns/iteration

<none> with LDT selector: 0ns/iteration
fs with LDT selector: 21ns/iteration
gs with LDT selector: 26ns/iteration

<none> with GDT selector: 0ns/iteration
fs with GDT selector: 21ns/iteration
gs with GDT selector: 26ns/iteration

"Pentium 75 - 200" @166.206Mhz (5,2,12):
ds=7b fs=0 gs=33 ldt=f gdt=3b GTOD
<none> with data selector: 1ns/iteration
fs with data selector: 74ns/iteration
gs with data selector: 75ns/iteration

<none> with LDT selector: 1ns/iteration
fs with LDT selector: 74ns/iteration
gs with LDT selector: 75ns/iteration

<none> with GDT selector: 1ns/iteration
fs with GDT selector: 74ns/iteration
gs with GDT selector: 74ns/iteration

"AMD-K6(tm) 3D+ Processor" @451.105Mhz (5,9,1):
ds=7b fs=0 gs=33 ldt=f gdt=3b GTOD
<none> with data selector: 0ns/iteration
fs with data selector: 59ns/iteration
gs with data selector: 44ns/iteration

<none> with LDT selector: 0ns/iteration
fs with LDT selector: 59ns/iteration
gs with LDT selector: 44ns/iteration

<none> with GDT selector: 0ns/iteration
fs with GDT selector: 59ns/iteration
gs with GDT selector: 44ns/iteration

"AMD Athlon(tm) XP 3000+" @2162.74Mhz (6,10,0):
ds=7b fs=0 gs=33 ldt=f gdt=3b CPUTIME
<none> with data selector: 0ns/iteration
fs with data selector: 10ns/iteration
gs with data selector: 11ns/iteration

<none> with LDT selector: 0ns/iteration
fs with LDT selector: 11ns/iteration
gs with LDT selector: 11ns/iteration

<none> with GDT selector: 0ns/iteration
fs with GDT selector: 11ns/iteration
gs with GDT selector: 11ns/iteration


"AMD Athlon(tm) 64 Processor 3500+" @2210.23Mhz (15,31,0):
ds=2b fs=0 gs=63 ldt=f gdt=6b GTOD
<none> with data selector: 0ns/iteration
fs with data selector: 11ns/iteration
gs with data selector: 11ns/iteration

<none> with LDT selector: 0ns/iteration
fs with LDT selector: 10ns/iteration
gs with LDT selector: 11ns/iteration

<none> with GDT selector: 0ns/iteration
fs with GDT selector: 10ns/iteration
gs with GDT selector: 11ns/iteration

"Pentium III (Coppermine)" @700Mhz (6,8,6):
ds=7b fs=0 gs=33 ldt=f gdt=3b CPUTIME
<none> with data selector: 0ns/iteration
fs with data selector: 38ns/iteration
gs with data selector: 45ns/iteration

<none> with LDT selector: 0ns/iteration
fs with LDT selector: 39ns/iteration
gs with LDT selector: 41ns/iteration

<none> with GDT selector: 0ns/iteration
fs with GDT selector: 39ns/iteration
gs with GDT selector: 44ns/iteration