Re: Finding open-coded workarounds for 1/2-byte cmpxchg()?

From: Paul E. McKenney
Date: Fri Apr 05 2024 - 19:18:40 EST


On Sat, Apr 06, 2024 at 01:00:35AM +0200, Julia Lawall wrote:
>
>
> On Thu, 4 Apr 2024, Paul E. McKenney wrote:
>
> > Hello, Julia!
> >
> > I hope that things are going well for you and yours.
> >
> > TL;DR: Would you or one of your students be interested in looking for
> > some interesting code patterns involving cmpxchg? If such patterns exist,
> > we would either need to provide fixes or to drop support for old systems.
> >
> > If this would be of interest, please read on!
> >
> > Arnd (CCed) and I are looking for open-coded emulations for one-byte
> > and two-byte cmpxchg(). Such emulations might be attempting to work
> > around the fact that not all architectures support those sizes, being
> > as they are only required to support four-byte cmpxchg() and, if they
> > are 64-bit architectures, eight-byte cmpxchg().
> >
> > There is a one-byte emulation in RCU (kernel/rcu/tasks.h), which looks
> > like this:
> >
> > ------------------------------------------------------------------------
> >
> > u8 rcu_trc_cmpxchg_need_qs(struct task_struct *t, u8 old, u8 new)
> > {
> > union rcu_special ret;
> > union rcu_special trs_old = READ_ONCE(t->trc_reader_special);
> > union rcu_special trs_new = trs_old;
> >
> > if (trs_old.b.need_qs != old)
> > return trs_old.b.need_qs;
> > trs_new.b.need_qs = new;
> > ret.s = cmpxchg(&t->trc_reader_special.s, trs_old.s, trs_new.s);
> > return ret.b.need_qs;
> > }
> >
> > ------------------------------------------------------------------------
> >
> > An additional issue is posed by these, also in kernel/rcu/tasks.h:
> >
> > ------------------------------------------------------------------------
> >
> > if (trs.b.need_qs == (TRC_NEED_QS_CHECKED | TRC_NEED_QS)) {
> >
> > return smp_load_acquire(&t->trc_reader_special.b.need_qs);
> >
> > smp_store_release(&t->trc_reader_special.b.need_qs, v);
> >
> > ------------------------------------------------------------------------
> >
> > The additional issue is that these statements assume that each CPU
> > architecture has single-byte load and store instructions, which some of
> > the older Alpha systems do not. Fortunately for me, Arnd was already
> > thinking in terms of removing support for these systems.
> >
> > But there are additional systems that do not support 16-bit loads and
> > stores. So if there is a 16-bit counterpart to rcu_trc_cmpxchg_need_qs()
> > on a quantity that is also subject to 16-bit loads or stores, either
> > that function needs adjustment or a few more ancient systems need to
> > lose their Linux-kernel support.
> >
> > Again, is looking for this sort of thing something that you or one of
> > your students would be interested in?
>
> Hello,
>
> I tried, but without much success. The following looks a little bit
> promising, eg the use of the variable name "want", but it's not clear that
> the rest of the context fits the pattern.

Thank you for digging into this!!!

> diff -u -p /home/julia/linux/net/sunrpc/xprtsock.c
> /tmp/nothing/net/sunrpc/xprtsock.c
> --- /home/julia/linux/net/sunrpc/xprtsock.c
> +++ /tmp/nothing/net/sunrpc/xprtsock.c
> @@ -690,12 +690,9 @@ xs_read_stream(struct sock_xprt *transpo
> if (ret <= 0)
> goto out_err;
> transport->recv.offset = ret;
> - if (transport->recv.offset != want)
> - return transport->recv.offset;

Agreed, though you are quite right that ->recv.copied and ->recv.offset
are different lengths. But yes, as you sugggest below, there must be
a cmpxchg() of some type (cmpxchg(), cmpxchg_acquire(), ...) in the mix
somewhere. Also, the cmpxchg() must be applied to a pointer to either
a 32-bit or a 64-bit quantity, but the change must be 16 bits (or 8 bits).

> The semantic patch in question was:
>
> @r@
> expression olde;
> idexpression old;
> @@
>
> if (olde != old) { ... return olde; }
>
> @@
> expression newe != r.olde;
> idexpression nw;
> expression r.olde;
> idexpression r.old;
> @@
>
> *if (olde != old) { ... return olde; }
> ...
> *newe = nw;
> ...
> *return newe;
>
> The semantic patch doesn't include the cmpxchg. I wasn't sure if that
> would always be present, or in what form.

It would be, but I am having trouble characterizing exactly what the
pattern would look like beyond "emulating a 16-bit cmpxchg() using either
a 32-bit cmpxchg() or a 64-bit cmpxchg()". :-(

Thank you again, and something to think more about.

Thanx, Paul