> I'm trying to understand why you think they are needed at all. Except
> for code that specifically does non-temporal we don't need fences on an
> X86, and the code that uses non temporal stores has its own fences built
> in.
>
> So as far as I can see the only cases we ever have to care about are
>
> PPro - processor bug
> IDT Winchip - because we run it in oostore module not strict x86 mode
>
> I don't see why you are generating extra fence instructions for other
> cases
>
__volatile__ and : : :"memory" omitted from asm statements
Both without and with patch:
- barrier(): asm("")
Without patch:
- mb(): asm("lock; addl $0,0(%%esp)")
- rmb(): asm("lock; addl $0,0(%%esp)")
- wmb: if(OOSTORE) asm("lock; addl $0,0(%%esp)") else barrier()
With patch:
- mb(): if(SSE2) asm("mfence") else asm("lock; addl $0,0(%%esp)")
- rmb(): if(SSE2) asm("lfence") else asm("lock; addl $0,0(%%esp)")
- wmb: if(OOSTORE) {if(MMXEXT) asm("sfence") else asm("lock; addl
$0,0(%%esp)")} else barrier()
So I'm only replacing the lock; addl $0,0(%%esp) with the Xfence
instructions which are more efficient.
As for the need for fences, based on the Intel documentation it seems
that we need read fences to read all hardware locations not mapped as
uncacheable and write fences for all memory locations mapped as write
combining.
Since drivers often map cacheable memory and then use rmb(), rmb()
cannot be made a nop.
This archive was generated by hypermail 2b29 : Wed Aug 07 2002 - 22:00:26 EST