On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave@xxxxxxxxxxxx> wrote:
So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is
constantly cheaper (by at least half the latency) than MFENCE. While there
was a decent amount of variation, this difference remained rather constant.
Mind testing "lock addq $0,0(%rsp)" instead of mfence? That's what we
use on old cpu's without one (ie 32-bit).
I'm not actually convinced that mfence is necessarily a good idea. I
could easily see it being microcode, for example.
At least on my Haswell, the "lock addq" is pretty much exactly half
the cost of "mfence".