Re: liburcu: LTO breaking rcu_dereference on arm64 and possibly other architectures ?

From: Mathieu Desnoyers
Date: Fri Apr 16 2021 - 15:30:57 EST


----- On Apr 16, 2021, at 3:02 PM, paulmck paulmck@xxxxxxxxxx wrote:
[...]
>
> If it can be done reasonably, I suggest also having some way for the
> person building userspace RCU to say "I know what I am doing, so do
> it with volatile rather than memory_order_consume."

Like so ?

#define CMM_ACCESS_ONCE(x) (*(__volatile__ __typeof__(x) *)&(x))
#define CMM_LOAD_SHARED(p) CMM_ACCESS_ONCE(p)

/*
* By defining URCU_DEREFERENCE_USE_VOLATILE, the user requires use of
* volatile access to implement rcu_dereference rather than
* memory_order_consume load from the C11/C++11 standards.
*
* This may improve performance on weakly-ordered architectures where
* the compiler implements memory_order_consume as a
* memory_order_acquire, which is stricter than required by the
* standard.
*
* Note that using volatile accesses for rcu_dereference may cause
* LTO to generate incorrectly ordered code starting from C11/C++11.
*/

#ifdef URCU_DEREFERENCE_USE_VOLATILE
# define rcu_dereference(x) CMM_LOAD_SHARED(x)
#else
# if defined (__cplusplus)
# if __cplusplus >= 201103L
# include <atomic>
# define rcu_dereference(x) ((std::atomic<__typeof__(x)>)(x)).load(std::memory_order_consume)
# else
# define rcu_dereference(x) CMM_LOAD_SHARED(x)
# endif
# else
# if (defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L)
# include <stdatomic.h>
# define rcu_dereference(x) atomic_load_explicit(&(x), memory_order_consume)
# else
# define rcu_dereference(x) CMM_LOAD_SHARED(x)
# endif
# endif
#endif

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com