>>> @@ -23,6 +23,7 @@
>>> #define REG_L __REG_SEL(ld, lw)
>>> #define REG_S __REG_SEL(sd, sw)
>>> #define REG_SC __REG_SEL(sc.d, sc.w)
>>> +#define REG_AMOSWAP_AQ __REG_SEL(amoswap.d.aq, amoswap.w.aq)
>> Below is the reason why I use the relax version here:
>> https://lore.kernel.org/all/CAJF2gTRAEX_jQ_w5H05dyafZzHq+P5j05TJ=C+v+OL__GQam4A@xxxxxxxxxxxxxx/T/#u
>
> Sorry, I hadn't seen that one. Adding Andrea. IMO the acquire/release pair is necessary here, with just relaxed the stack stores inside the lock could show up on the next hart trying to use the stack.
I think what you really want is a *consume* barrier, and since you have
the data dependency between the amoswap and the memory accesses (and
this isn’t Alpha) you’re technically fine without the acquire, since
you’re writing assembly and have the data dependency as syntactic.
Though you may still want (need?) the acquire so loads/stores unrelated
to the stack pointer that happen later in program order get ordered
after the load of the new stack pointer in case there could be weird
issues *there*.
Agreed.
Just the fact that this is the 4th iteration of this discussion strongly
suggests to stick to the acquire and these inline comments to me. ;)