Re: [PATCH] [5/20] x86: Introduce nsec_barrier() II

From: Andi Kleen
Date: Mon Jan 07 2008 - 15:01:18 EST


Andi Kleen <ak@xxxxxxx> writes:

> On Thursday 03 January 2008 11:47:54 Ingo Molnar wrote:
>>
>> * Andi Kleen <ak@xxxxxxx> wrote:
>>
>> > nsec_barrier() is a new barrier primitive that stops RDTSC speculation
>> > to avoid races with timer interrupts on other CPUs.
>> >
>> > Add it to all architectures. Except for x86 it is a nop right now. I
>> > only tested x86, but it's a very simple change.
>> >
>> > On x86 it expands either to LFENCE (for Intel CPUs) or MFENCE (for AMD
>> > CPUs) which stops RDTSC on all currently known microarchitectures that
>> > implement SSE. On CPUs without SSE there is generally no RDTSC
>> > speculation.
>>
>> i've picked up your rdtsc patches into x86.git but have simplified it:
>> there's no nsec_barrier() anymore - rdtsc() is always synchronous.
>> MFENCE/LFENCE is fast enough. Open-coding such barriers almost always
>> leads to needless trouble. Please check the next x86.git tree.
>
> That's most likely wrong unless you added two barriers -- the barriers
> are strictly need to be before and after RDTSC.

I checked your patch now -- 743abf4d987911af1ffce4c96f06cba6ffaa7e88
and 428f309ba5244fa25b44fcdf1d79aa94c8745cfd in gitx86 and you did it
indeed wrong.

The problem is that you inserted the barrier only after the RDTSC, but
the instruction can be speculated both forward and backward and both
can cause inconsistencies if it leaves the critical section.

The minimal fix would be to change native_read_tsc to

rdtsc_barrier();
asm volatile("rdtsc" : EAX_EDX_RET(val, low, high));
rdtsc_barrier();

(e.g. add the missing barrier)

Better would be still to keep the explicit nsec barriers as in
the original patch for several reasons:
- There are situations where RDTSC without barrier is ok and it might
be useful there.
- It is better to give the CPU some room to run uops in parallel;
especially in very performance critical like gtod(). A wider
barriered area is faster.
- I actually expect that other architectures with aggressive OOO
implementations (like POWER4/5) should make use of
nsec_barrier() too. I don't think it's really an x86 specific concept.
- Explicit barriers can be useful when measuring performance, although
admittedly there a x86 specific barrier is usually ok. However forcing
the barrier inside RDTSC is not.

If you insist to keep the incorrect patch please drop my name from it.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/