Finding open-coded workarounds for 1/2-byte cmpxchg()?

From: Paul E. McKenney
Date: Thu Apr 04 2024 - 18:51:30 EST


Hello, Julia!

I hope that things are going well for you and yours.

TL;DR: Would you or one of your students be interested in looking for
some interesting code patterns involving cmpxchg? If such patterns exist,
we would either need to provide fixes or to drop support for old systems.

If this would be of interest, please read on!

Arnd (CCed) and I are looking for open-coded emulations for one-byte
and two-byte cmpxchg(). Such emulations might be attempting to work
around the fact that not all architectures support those sizes, being
as they are only required to support four-byte cmpxchg() and, if they
are 64-bit architectures, eight-byte cmpxchg().

There is a one-byte emulation in RCU (kernel/rcu/tasks.h), which looks
like this:

------------------------------------------------------------------------

u8 rcu_trc_cmpxchg_need_qs(struct task_struct *t, u8 old, u8 new)
{
union rcu_special ret;
union rcu_special trs_old = READ_ONCE(t->trc_reader_special);
union rcu_special trs_new = trs_old;

if (trs_old.b.need_qs != old)
return trs_old.b.need_qs;
trs_new.b.need_qs = new;
ret.s = cmpxchg(&t->trc_reader_special.s, trs_old.s, trs_new.s);
return ret.b.need_qs;
}

------------------------------------------------------------------------

An additional issue is posed by these, also in kernel/rcu/tasks.h:

------------------------------------------------------------------------

if (trs.b.need_qs == (TRC_NEED_QS_CHECKED | TRC_NEED_QS)) {

return smp_load_acquire(&t->trc_reader_special.b.need_qs);

smp_store_release(&t->trc_reader_special.b.need_qs, v);

------------------------------------------------------------------------

The additional issue is that these statements assume that each CPU
architecture has single-byte load and store instructions, which some of
the older Alpha systems do not. Fortunately for me, Arnd was already
thinking in terms of removing support for these systems.

But there are additional systems that do not support 16-bit loads and
stores. So if there is a 16-bit counterpart to rcu_trc_cmpxchg_need_qs()
on a quantity that is also subject to 16-bit loads or stores, either
that function needs adjustment or a few more ancient systems need to
lose their Linux-kernel support.

Again, is looking for this sort of thing something that you or one of
your students would be interested in?

Thanx, Paul