Re: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD?

From: Michael S. Tsirkin
Date: Wed Aug 03 2016 - 00:36:51 EST

Next message: kbuild test robot: "Re: [PATCH 1/5] bfa: mark symbols static where possible"
Previous message: Florian Fainelli: "Re: [PATCH] phy: fix the bug when remove the phy driver"
Next in thread: Henrique de Moraes Holschuh: "Re: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Mar 03, 2016 at 11:05:43AM -0800, H. Peter Anvin wrote:
> On March 3, 2016 10:35:50 AM PST, "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
> >On Thu, Mar 03, 2016 at 04:34:53PM +0100, Peter Zijlstra wrote:
> >> On Thu, Mar 03, 2016 at 04:27:39PM +0100, Ingo Molnar wrote:
> >> >
> >> > * Dexuan Cui <decui@xxxxxxxxxxxxx> wrote:
> >> >
> >> > > Hi,
> >> > > My understanding about arch/x86/include/asm/barrier.h is:
> >obviously Linux
> >> > > more likes {L,S,M}FENCE -- Locked ADD is only used in x86_32
> >platforms that
> >> > > don't support XMM2.
> >> > >
> >> > > However, it looks people say Locked Add is much faster than the
> >FENCE
> >> > > instructions, even on modern Intel CPUs like Haswell, e.g.,
> >please see
> >> > > the three sources:
> >> > >
> >> > > " 11.5.1 Locked Instructions as Memory Barriers
> >> > > Optimization
> >> > > Use locked instructions to implement Store/Store and Store/Load
> >barriers.
> >> > > "
> >> > > http://support.amd.com/TechDocs/47414_15h_sw_opt_guide.pdf
> >> > >
> >> > > "lock addl %(rsp), 0 is a better solution for StoreLoad barrier
> >":
> >> > > http://shipilev.net/blog/2014/on-the-fence-with-dependencies/
> >> > >
> >> > > "...locked instruction are more efficient barriers...":
> >> > >
> >http://www.pvk.ca/Blog/2014/10/19/performance-optimisation-~-writing-an-essay/
> >> > >
> >> > > I also found that FreeBSD prefers Locked Add.
> >> > >
> >> > > So, I'm curious why Linux prefers MFENCE.
> >> > > I guess I may be missing something.
> >> > >
> >> > > I tried to google the question, but didn't find an answer.
> >> >
> >> > It's being worked on, see this thread on lkml from a few weeks ago:
> >> >
> >> > C Jan 13 Michael S. Tsir | [PATCH v3 0/4] x86: faster
> >mb()+documentation tweaks
> >> > C Jan 13 Michael S. Tsir | ââ>[PATCH v3 1/4] x86: add cc
> >clobber for addl
> >> > C Jan 13 Michael S. Tsir | ââ>[PATCH v3 2/4] x86: drop a
> >comment left over from X86_OOSTORE
> >> > C Jan 13 Michael S. Tsir | ââ>[PATCH v3 3/4] x86: tweak the
> >comment about use of wmb for IO
> >> > C Jan 13 Michael S. Tsir | ââ>[PATCH v3 4/4] x86: drop mfence
> >in favor of lock+addl
> >> >
> >> > The 4th patch changes MFENCE to a LOCK ADDL locked instruction.
> >>
> >> Lots of additional chatter here:
> >>
> >> lkml.kernel.org/r/20160112150032-mutt-send-email-mst@xxxxxxxxxx
> >>
> >> And some useful bits here:
> >>
> >> lkml.kernel.org/r/56957D54.5000602@xxxxxxxxx
> >>
> >> latest version here:
> >>
> >> lkml.kernel.org/r/1453921746-16178-1-git-send-email-mst@xxxxxxxxxx
> >
> >It's ready as far as I am concerned.
> >Basically we are just waiting for ack from hpa.
>
> And I'm still discussing this with the hardware people. It seems we
> can do this for *most* things, but not all; the question is where
> exactly we need to do something different.

I'm guessing there's still no update?

There's a decent chance that without documentation a bunch of current
uses are actually broken. See for example
http://marc.info/?l=linux-kernel&m=145400059304553&w=2
which going by the manual is fixing smp_mb misuse for clflush - or maybe not?

> --
> Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

Next message: kbuild test robot: "Re: [PATCH 1/5] bfa: mark symbols static where possible"
Previous message: Florian Fainelli: "Re: [PATCH] phy: fix the bug when remove the phy driver"
Next in thread: Henrique de Moraes Holschuh: "Re: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]