Re: [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

From: Michael S. Tsirkin
Date: Thu Oct 11 2018 - 14:11:47 EST


On Thu, Oct 11, 2018 at 10:37:07AM -0700, Andres Freund wrote:
> Hi,
>
> On 2016-01-26 10:20:14 +0200, Michael S. Tsirkin wrote:
> > On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote:
> > > On 01/12/16 14:10, Michael S. Tsirkin wrote:
> > > > mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
> > > > 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.
> > > >
> > > > So let's use the locked variant everywhere - helps keep the code simple as
> > > > well.
> > > >
> > > > While I was at it, I found some inconsistencies in comments in
> > > > arch/x86/include/asm/barrier.h
> > > >
> > > > I hope I'm not splitting this up too much - the reason is I wanted to isolate
> > > > the code changes (that people might want to test for performance) from comment
> > > > changes approved by Linus, from (so far unreviewed) comment change I came up
> > > > with myself.
> > > >
> > > > Lightly tested on my system.
> > > >
> > > > Michael S. Tsirkin (3):
> > > > x86: drop mfence in favor of lock+addl
> > > > x86: drop a comment left over from X86_OOSTORE
> > > > x86: tweak the comment about use of wmb for IO
> > > >
> > >
> > > I would like to get feedback from the hardware team about the
> > > implications of this change, first.
>
> > Any luck getting some feedback on this one?
>
> Ping? I just saw a bunch of kernel fences in a benchmark, making me
> wonder why linux uses mfence rather than lock addl. Leading me to this
> thread.
>
> Greetings,
>
> Andres Freund

It doesn't do it for smp_mb any longer:

commit 450cbdd0125cfa5d7bbf9e2a6b6961cc48d29730
Author: Michael S. Tsirkin <mst@xxxxxxxxxx>
Date: Fri Oct 27 19:14:31 2017 +0300

locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE


I didn't bother with mb() since I didn't think it's performance
critical, and one needs to worry about drivers possibly doing
non-temporals etc which do need mfence.

Do you see mb() in a benchmark then?

--
MST