Re: [PATCH] i386-pda UP optimization

From: Jeremy Fitzhardinge
Date: Tue Nov 21 2006 - 17:10:37 EST


Andi Kleen wrote:
>> For umask/getppid, assuming you're just running 1e7 iterations, you're
>> seeing a difference of 25 and 35ns per iteration difference. I wonder
>> why it would be different for different syscalls; I would expect it to
>> be a constant overhead either way.
>>
>
> They got different numbers of current references?
>

My understanding is that Eric has changed UP current (and other PDA ops)
to not touch %gs at all, and the difference in reported times in due
omitting the %gs load in entry.S (though %gs is still save/restored on
the stack).

> On such micro benchmarks everything should be cache hot in theory
> (unless it's a system with really small cache)
>

Yes, that would be my thought too, but maybe there's excessive aliasing
on one of the ways, but I think he's using a Pentium M which has a 8-way L1.

>> been planning on a patch to rearrange the gdt in order to pack all the
>> commonly used segment descriptors into one or two cache lines so that
>> all the segment register reloads can be done with a minimum of cache
>> misses. It would be interesting for you to replace the:
>>
>> movl $(__KERNEL_PDA), %edx; movl %edx, %gs
>>
>> with an appropriate read of the gdt entry, hm, which is a bit complex to
>> find.
>>
>
> On UP it could be hardcoded. And oprofile can be used to profile for cache misses.
>

Yes, assuming oprofile doesn't interfere with things too much.
Actually, just counting cache miss events during the course of a syscall
would be most interesting (ie, no need to sample).


J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/