Re: [PATCH 2/9] RISC-V: Atomic and Locking Code
From: Boqun Feng
Date: Thu Jul 06 2017 - 22:14:40 EST
On Thu, Jul 06, 2017 at 06:04:13PM -0700, Palmer Dabbelt wrote:
[...]
> >> +#define __smp_load_acquire(p) \
> >> +do { \
> >> + union { typeof(*p) __val; char __c[1]; } __u = \
> >> + { .__val = (__force typeof(*p)) (v) }; \
> >> + compiletime_assert_atomic_type(*p); \
> >> + switch (sizeof(*p)) { \
> >> + case 1: \
> >> + case 2: \
> >> + __u.__val = READ_ONCE(*p); \
> >> + smb_mb(); \
> >> + break; \
> >> + case 4: \
> >> + __asm__ __volatile__ ( \
> >> + "amoor.w.aq %1, zero, %0" \
> >> + : "+A" (*p) \
> >> + : "=r" (__u.__val) \
> >> + : "memory"); \
> >> + break; \
> >> + case 8: \
> >> + __asm__ __volatile__ ( \
> >> + "amoor.d.aq %1, zero, %0" \
> >> + : "+A" (*p) \
> >> + : "=r" (__u.__val) \
> >> + : "memory"); \
> >> + break; \
> >> + } \
> >> + __u.__val; \
> >> +} while (0)
> >
> > 'creative' use of amoswap and amoor :-)
> >
> > You should really look at a normal load with ordering instruction
> > though, that amoor.aq is a rmw and will promote the cacheline to
> > exclusive (and dirty it).
>
> The thought here was that implementations could elide the MW by pattern
> matching the "zero" (x0, the architectural zero register) forms of AMOs where
> it's interesting. I talked to one of our microarchitecture guys, and while he
> agrees that's easy he points out that eliding half the AMO may wreak havoc on
> the consistency model. Since we're not sure what the memory model is actually
> going to look like, we thought it'd be best to just write the simplest code
> here
>
> /*
> * TODO_RISCV_MEMORY_MODEL: While we could emit AMOs for the W and D sized
> * accesses here, it's questionable if that actually helps or not: the lack of
> * offsets in the AMOs means they're usually preceded by an addi, so they
> * probably won't save code space. For now we'll just emit the fence.
> */
> #define __smp_store_release(p, v) \
> ({ \
> compiletime_assert_atomic_type(*p); \
> smp_mb(); \
> WRITE_ONCE(*p, v); \
> })
>
> #define __smp_load_acquire(p) \
> ({ \
> union{typeof(*p) __p; long __l;} __u; \
AFAICT, there seems to be an endian issue if you do this. No?
Let us assume typeof(*p) is char and *p == 1, and on a big endian 32bit
platform:
> compiletime_assert_atomic_type(*p); \
> __u.__l = READ_ONCE(*p); \
READ_ONCE(*p) is 1 so
__u.__l is 0x00 00 00 01 now
> smp_mb(); \
> __u.__p; \
__u.__p is then 0x00.
Am I missing something here?
Even so why not use the simple definition as in include/asm-generic/barrier.h?
Regards,
Boqun
> })
>
[...]
Attachment:
signature.asc
Description: PGP signature