On Mon, 2002-08-12 at 17:12, Ingo Molnar wrote:
>
> On 12 Aug 2002, Luca Barbieri wrote:
>
> > Numbers:
> > unconditional copy of 2 tls descs: 5 cycles
> > this patch with 1 tls desc: 26 cycles
> > this patch with 8 tls descs: 52 cycles
>
> [ 0 tls descs: 2 cycles. ]
Yes but common multithreaded applications will have at least 1 for
pthreads.
> but yes, this is rougly what i'd say this approach costs.
>
> > lldt: 51 cycles
> > lgdt: 50 cycles
> > context switch: 2000 cycles (measured with pipe read/write and vmstat so
> > it's not very accurate)
>
> > So this patch causes a 1% context switch performance drop for
> > multithreaded applications.
>
> how did you calculate this?
((26 - 5) / 2000) * 100 ~= 1
Benchmarks done in kernel mode (2.4.18) with interrupts disabled on a
Pentium3 running the rdtsc timed benchmark in a loop 1 million times
with 8 unbenchmarked iterations to warm up caches and with the time to
execute an empty benchmark subtracted.
> glibc multithreaded applications can avoid the
> lldt via using the TLS, and thus it's a net win.
Surely, this patch is better than the old LDT method but much worse than
the 2-TLS one.
So I would use the 2-TLS approach plus my patch plus the syscall and
segment.h improvements of the tls-2.5.31-C3 patch plus support for
setting the 0x40 segment around APM calls.
BTW, are there any programs that would benefit from having more than 2
user-settable GDT entries but that don't need more than about 8?
(assuming we have a fixed flat code and data segment and 0x40 segment)
This archive was generated by hypermail 2b29 : Thu Aug 15 2002 - 22:00:27 EST