Re: [RFC PATCH v2] arm64: barrier: add macro dgh() to control memory accesses merging

From: Catalin Marinas
Date: Tue Dec 14 2021 - 06:20:45 EST


On Tue, Dec 14, 2021 at 10:46:20AM +0000, Will Deacon wrote:
> On Fri, Dec 03, 2021 at 12:47:47PM +0000, Catalin Marinas wrote:
> > On Fri, Oct 22, 2021 at 09:51:40AM +0800, Xiongfeng Wang wrote:
> > > On 2021/10/16 1:59, Catalin Marinas wrote:
> > > > On Fri, Oct 15, 2021 at 05:05:11PM +0800, Xiongfeng Wang wrote:
> > > >> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
> > > >> index 451e11e5fd23..d71a7457d619 100644
> > > >> --- a/arch/arm64/include/asm/barrier.h
> > > >> +++ b/arch/arm64/include/asm/barrier.h
> > > >> @@ -18,6 +18,14 @@
> > > >> #define wfe() asm volatile("wfe" : : : "memory")
> > > >> #define wfi() asm volatile("wfi" : : : "memory")
> > > >>
> > > >> +/*
> > > >> + * Data Gathering Hint:
> > > >> + * This instruction prohibits merging memory accesses with Normal-NC or
> > > >> + * Device-GRE attributes before the hint instruction with any memory accesses
> > > >> + * appearing after the hint instruction.
> > > >> + */
> > > >> +#define dgh() asm volatile("hint #6" : : : "memory")
> > > >
> > > > On its own, this patch doesn't do anything. It's more interesting to see
> > > > how it will be used and maybe come up with a common name that other
> > > > architectures would share (or just implement as no-opp). I'm not sure
> > > > there was any conclusion last time we discussed this.
> > >
> > > In the last mail, I was suggested to investigate the code in other architecture
> > > to find if there exists similar interface. I searched 'merg' in the code and
> > > didn't find similar interface.
> >
> > Maybe no other architecture has such hint. They have write buffer
> > draining but that's more expensive.
> >
> > > The only thing similar I found is in Intel Software Developer's Manual. It says
> > > "Write Combining (WC) ... Writes may be delayed and combined in the write
> > > combining buffer (WC buffer) to reduce memory accesses. If the WC buffer is
> > > partially filled, the writes may be delayed until the next occurrence of a
> > > serializing event; such as an SFENCE or MFENCE instruction, CPUID or other
> > > serializing instruction, a read or write to uncached memory, an interrupt
> > > occurrence, or an execution of a LOCK instruction (including one with an
> > > XACQUIRE or XRELEASE prefix)."
> > > Maybe this is more like the write combine buffer flushing, not prevent merging.
> > > Sorry I still didn't understand the difference clearly.
> >
> > IIUC those drivers on x86 just rely on the microarchitecture aspects of
> > the draining (e.g. fill 64 bytes). On Arm we don't have such guarantee
> > as there's a wide variation in implementations, hence the DGH
> > instruction.
> >
> > > How about a common name called 'merge_prohibit_hint()'? Could you give me some
> > > suggestions ?
> >
> > I think "prohibit" looks more like not allowing any write-buffer merge.
> > Maybe stop_merge_hint(), stop_write_buffer_merge(),
> > stop_write_combining() (any other ideas?). It would be a NOP on all
> > other architectures.
>
> Given that DGH only affects prior memory accesses to Normal-NC or Device-GRE
> memory, we probably want to avoid making this too broad. We don't use GRE in
> Linux and I think Normal-NC is only exposed as an explicit part of the I/O
> accessors in ioremap_wc(). So I'd be inclined to add something like:
>
> io_stop_wc()
>
> with corresponding documentation in Documentation/memory-barriers.txt.

Looks fine to me.

--
Catalin