Re: [PATCH 3/4] x86,asm: Re-work smp_store_mb()

From: Davidlohr Bueso
Date: Mon Nov 02 2015 - 20:36:40 EST

Next message: Krzysztof Kozlowski: "Re: [PATCH v3 1/7] ARM: EXYNOS: removing redundant code from regs-pmu.h"
Previous message: Namhyung Kim: "Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)"
In reply to: Linus Torvalds: "Re: [PATCH 3/4] x86,asm: Re-work smp_store_mb()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, 02 Nov 2015, Linus Torvalds wrote:

On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave@xxxxxxxxxxxx> wrote:

So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is
constantly cheaper (by at least half the latency) than MFENCE. While there
was a decent amount of variation, this difference remained rather constant.

Mind testing "lock addq $0,0(%rsp)" instead of mfence? That's what we
use on old cpu's without one (ie 32-bit).

I'm getting results very close to xchg.

I'm not actually convinced that mfence is necessarily a good idea. I
could easily see it being microcode, for example.

Interesting.

At least on my Haswell, the "lock addq" is pretty much exactly half
the cost of "mfence".

Ok, his coincides with my results on IvB.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Krzysztof Kozlowski: "Re: [PATCH v3 1/7] ARM: EXYNOS: removing redundant code from regs-pmu.h"
Previous message: Namhyung Kim: "Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)"
In reply to: Linus Torvalds: "Re: [PATCH 3/4] x86,asm: Re-work smp_store_mb()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]