Re: [PATCH 2/3] MIPS: enforce LL-SC loop enclosing with SYNC (ACQUIRE and RELEASE)

From: James Hogan
Date: Tue Jun 02 2015 - 07:39:20 EST


Hi Leonid,

On 02/06/15 01:09, Leonid Yegoshin wrote:
> Many MIPS32 R2 and all MIPS R6 CPUs are out of order execution, so it
> needs memory barriers in SMP environment. However, past cores may have
> a pipeline short enough to ignore that requirements and problem may
> never occurs until recently.
>
> This patch gives an option to enclose LL-SC loops by SYNC barriers in spinlocks,
> atomics, futexes, cmpxchg and bitops.

Please reflow text.

>
> So, this option is defined for MIPS32 R2 only, because that recent

s/that/those/

> CPUs may occasionally have problems in accordance with HW team.

"have problems in accordance with HW team" is a bit confusing. What do
you mean?

> And most of MIPS64 R2 vendor processors already have some kind of memory
> barrier and the only one generic 5KEs has a pretty short pipeline.
>
> Using memory barriers in MIPS R6 is mandatory, all that

s/that/those/

> processors have a speculative memory read which can inflict a trouble

s/a //

> without a correct use of barriers in LL-SC loop cycles.
> The same is actually for MIPS32 R5 I5600 processor.

Actually *true*?
P5600 you mean?

Same in Kconfig help text.


>
> Signed-off-by: Leonid Yegoshin <Leonid.Yegoshin@xxxxxxxxxx>
> ---
> arch/mips/Kconfig | 25 +++++++++++++++++++++++++
> arch/mips/include/asm/barrier.h | 26 ++++++++++++++++++++++----
> 2 files changed, 47 insertions(+), 4 deletions(-)
>
> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index c7d0cacece3d..676eb64f5545 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -1896,6 +1896,30 @@ config MIPS_LIGHTWEIGHT_SYNC
> converted to generic "SYNC 0".
>
> If you unsure, say N here, it may slightly decrease your performance
> +
> +config MIPS_ENFORCE_WEAK_REORDERING_BEYOND_LLSC
> + bool "Enforce memory barriers at LLSC loops - atomics, spinlocks etc"
> + depends on CPU_MIPS32_R2
> + default y if CPU_MIPSR6
> + select WEAK_REORDERING_BEYOND_LLSC
> + help
> + Many MIPS32 R2 and all MIPS R6 CPUs are out of order execution, so it
> + needs memory barriers in SMP environment. However, past cores may have
> + a pipeline short enough to ignore that requirements and problem may
> + never occurs until recently.
> +
> + So, this option is defined for MIPS32 R2 only, because that recent
> + CPUs may occasionally have problems in accordance with HW team.
> + And MIPS64 R2 vendor processors already have some kind of memory
> + barrier and the only one generic 5KEs has a pretty short pipeline.
> +
> + Using memory barriers in MIPS R6 is mandatory, all that
> + processors have a speculative memory read which can inflict a trouble
> + without a correct use of barriers in LL-SC loop cycles.
> + The same is actually for MIPS32 R5 I5600 processor.
> +
> + If you unsure, say Y here, it may slightly decrease your performance

If you *are* unsure (same in patch 1). Same comment as patch 1 too about
ambiguity of "it".

> + but increase a reliability.

well only if the hardware has weak ordering, and only because it would
be technically wrong in that case to say N here.

It feels wrong to be giving the user this option. Can't we just select
WEAK_REORDERING_BEYOND_LLSC automatically based on the hardware that
needs to be supported by the kernel configuration (e.g. CPU_MIPSR6 or
CPU_MIPS32_R2)? Those who care about mips r2 performance on hardware
which doesn't strictly need it can always speak up / add an exception.

Out of interest, are futex operations safe with weak llsc ordering, on
the premise that they're mainly used by userland so ordering with normal
kernel accesses just doesn't matter in practice?

> endmenu
>
> #
> @@ -1924,6 +1948,7 @@ config CPU_MIPSR2
> config CPU_MIPSR6
> bool
> default y if CPU_MIPS32_R6 || CPU_MIPS64_R6
> + select MIPS_ENFORCE_WEAK_REORDERING_BEYOND_LLSC
> select MIPS_SPRAM
>
> config EVA
> diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
> index d2a63abfc7c6..f3cc7a91ac0d 100644
> --- a/arch/mips/include/asm/barrier.h
> +++ b/arch/mips/include/asm/barrier.h
> @@ -95,33 +95,51 @@
> # define smp_mb() __sync()
> # define smp_rmb() barrier()
> # define smp_wmb() __syncw()
> +# define smp_acquire() __sync()
> +# define smp_release() __sync()
> # else
> # ifdef CONFIG_MIPS_LIGHTWEIGHT_SYNC
> # define smp_mb() __asm__ __volatile__("sync 0x10" : : :"memory")
> # define smp_rmb() __asm__ __volatile__("sync 0x13" : : :"memory")
> # define smp_wmb() __asm__ __volatile__("sync 0x4" : : :"memory")
> +# define smp_acquire() __asm__ __volatile__("sync 0x11" : : :"memory")
> +# define smp_release() __asm__ __volatile__("sync 0x12" : : :"memory")

same question about use of sync_acquire/sync_release instructions which
binutils understands since 2.21.

> # else
> # define smp_mb() __asm__ __volatile__("sync" : : :"memory")
> # define smp_rmb() __asm__ __volatile__("sync" : : :"memory")
> # define smp_wmb() __asm__ __volatile__("sync" : : :"memory")
> +# define smp_acquire() __asm__ __volatile__("sync" : : :"memory")
> +# define smp_release() __asm__ __volatile__("sync" : : :"memory")
> # endif
> # endif
> #else
> #define smp_mb() barrier()
> #define smp_rmb() barrier()
> #define smp_wmb() barrier()
> +#define smp_acquire() barrier()
> +#define smp_release() barrier()
> #endif
>
> #if defined(CONFIG_WEAK_REORDERING_BEYOND_LLSC) && defined(CONFIG_SMP)
> +#ifdef CONFIG_MIPS_LIGHTWEIGHT_SYNC
> +#define __WEAK_LLSC_MB " sync 0x10 \n"
> +#define __WEAK_ACQUIRE " sync 0x11 \n"
> +#define __WEAK_RELEASE " sync 0x12 \n"

tabs

__WEAK_ACQUIRE and __WEAK_RELEASE are specific to llsc, maybe call them
__WEAK_LLSC_ACQUIRE and __WEAK_LLSC_RELEASE instead to avoid confusion.


> +#else
> #define __WEAK_LLSC_MB " sync \n"
> +#define __WEAK_ACQUIRE __WEAK_LLSC_MB
> +#define __WEAK_RELEASE __WEAK_LLSC_MB

tabs

> +#endif
> #else
> #define __WEAK_LLSC_MB " \n"
> +#define __WEAK_ACQUIRE __WEAK_LLSC_MB
> +#define __WEAK_RELEASE __WEAK_LLSC_MB

tabs

> #endif
>
> #define set_mb(var, value) \
> do { var = value; smp_mb(); } while (0)
>
> -#define smp_llsc_mb() __asm__ __volatile__(__WEAK_LLSC_MB : : :"memory")
> +#define smp_llsc_mb() __asm__ __volatile__(__WEAK_ACQUIRE : : :"memory")

tabs

>
> #ifdef CONFIG_CPU_CAVIUM_OCTEON
> #define smp_mb__before_llsc() smp_wmb()
> @@ -131,14 +149,14 @@
> "syncw\n\t" \
> ".set pop" : : : "memory")
> #else
> -#define smp_mb__before_llsc() smp_llsc_mb()
> +#define smp_mb__before_llsc() __asm__ __volatile__(__WEAK_RELEASE : : :"memory")
> #define nudge_writes() mb()
> #endif
>
> #define smp_store_release(p, v) \
> do { \
> compiletime_assert_atomic_type(*p); \
> - smp_mb(); \
> + smp_release(); \
> ACCESS_ONCE(*p) = (v); \
> } while (0)
>
> @@ -146,7 +164,7 @@ do { \
> ({ \
> typeof(*p) ___p1 = ACCESS_ONCE(*p); \
> compiletime_assert_atomic_type(*p); \
> - smp_mb(); \
> + smp_acquire(); \
> ___p1; \
> })
>
>
>

This patch does 3 logically separable things:
1) add smp_release/smp_acquire based on MIPS_LIGHTWEIGHT_SYNC and weaken
smp_store_release()/smp_load_acquire() to use them.
2) weaken llsc barriers when MIPS_LIGHTWEIGHT_SYNC.
3) the MIPS_ENFORCE_WEAK_REORDERING_BEYOND_LLSC Kconfig stuff (or
whatever method to select WEAK_REORDERING_BEYOND_LLSC more often).

Any reason not to split them, and give a clear description of each?

Cheers
James

Attachment: signature.asc
Description: OpenPGP digital signature