Re: [PATCH v2 05/12] MIPS: Barrier: Add definitions of SYNC stype values

From: Peter Zijlstra
Date: Wed Sep 07 2016 - 07:24:35 EST



This seems to be verbatim copies of the text from the manual. A few
questions below.

On Wed, Sep 07, 2016 at 10:45:13AM +0100, Matt Redfearn wrote:

> +/*
> + * Completion barriers:
> + * - Every synchronizable specified memory instruction (loads or stores or both)
> + * that occurs in the instruction stream before the SYNC instruction must be
> + * already globally performed before any synchronizable specified memory
> + * instructions that occur after the SYNC are allowed to be performed, with
> + * respect to any other processor or coherent I/O module.
> + *
> + * - The barrier does not guarantee the order in which instruction fetches are
> + * performed.
> + *
> + * - A stype value of zero will always be defined such that it performs the most
> + * complete set of synchronization operations that are defined.This means
> + * stype zero always does a completion barrier that affects both loads and
> + * stores preceding the SYNC instruction and both loads and stores that are
> + * subsequent to the SYNC instruction. Non-zero values of stype may be defined
> + * by the architecture or specific implementations to perform synchronization
> + * behaviors that are less complete than that of stype zero. If an
> + * implementation does not use one of these non-zero values to define a
> + * different synchronization behavior, then that non-zero value of stype must
> + * act the same as stype zero completion barrier. This allows software written
> + * for an implementation with a lighter-weight barrier to work on another
> + * implementation which only implements the stype zero completion barrier.
> + *
> + * - A completion barrier is required, potentially in conjunction with SSNOP (in
> + * Release 1 of the Architecture) or EHB (in Release 2 of the Architecture),
> + * to guarantee that memory reference results are visible across operating
> + * mode changes. For example, a completion barrier is required on some
> + * implementations on entry to and exit from Debug Mode to guarantee that
> + * memory effects are handled correctly.
> + */
> +
> +/*
> + * stype 0 - A completion barrier that affects preceding loads and stores and
> + * subsequent loads and stores.
> + * Older instructions which must reach the load/store ordering point before the
> + * SYNC instruction completes: Loads, Stores
> + * Younger instructions which must reach the load/store ordering point only
> + * after the SYNC instruction completes: Loads, Stores
> + * Older instructions which must be globally performed when the SYNC instruction
> + * completes: Loads, Stores
> + */
> +#define STYPE_SYNC 0x0

So, and I think there was no confusion on this point, "SYNC 0" is fully
transitive. Everything prior to the SYNC must be globally visible before
we continue.

> +/*
> + * Ordering barriers:
> + * - Every synchronizable specified memory instruction (loads or stores or both)
> + * that occurs in the instruction stream before the SYNC instruction must
> + * reach a stage in the load/store datapath after which no instruction
> + * re-ordering is possible before any synchronizable specified memory
> + * instruction which occurs after the SYNC instruction in the instruction
> + * stream reaches the same stage in the load/store datapath.
> + *
> + * - If any memory instruction before the SYNC instruction in program order,
> + * generates a memory request to the external memory and any memory
> + * instruction after the SYNC instruction in program order also generates a
> + * memory request to external memory, the memory request belonging to the
> + * older instruction must be globally performed before the time the memory
> + * request belonging to the younger instruction is globally performed.
> + *
> + * - The barrier does not guarantee the order in which instruction fetches are
> + * performed.
> + */
> +
> +/*
> + * stype 0x10 - An ordering barrier that affects preceding loads and stores and
> + * subsequent loads and stores.
> + * Older instructions which must reach the load/store ordering point before the
> + * SYNC instruction completes: Loads, Stores
> + * Younger instructions which must reach the load/store ordering point only
> + * after the SYNC instruction completes: Loads, Stores
> + * Older instructions which must be globally performed when the SYNC instruction
> + * completes: N/A
> + */
> +#define STYPE_SYNC_MB 0x10

This I'm not sure of; it states that things must become globally visible
in the order specified, but the wording leaves a fairly big hole. It
doesn't state that things cannot be less than globally visible at
intermediate times.

To take the example from Documentation/memory-barriers.txt:

CPU 1 CPU 2 CPU 3
======================= ======================= =======================
{ X = 0, Y = 0 }
STORE X=1 LOAD X STORE Y=1
<general barrier> <general barrier>
LOAD Y LOAD X

Suppose that CPU 2's load from X returns 1 and its load from Y returns 0.
This indicates that CPU 2's load from X in some sense follows CPU 1's
store to X and that CPU 2's load from Y in some sense preceded CPU 3's
store to Y. The question is then "Can CPU 3's load from X return 0?"


Is it ever possible for CPU2 and CPU3 to match "SYNC 10" points but to
disagree on their loads of X?

That is, even though CPU2 and CPU3 agree on their respective past and
future stores, the 'happens before' relation CPU1 and CPU2 have wrt. X
is not included?