Re: [PATCH 00/14] alpha: cleanups for 6.10

From: Maciej W. Rozycki
Date: Wed May 29 2024 - 14:50:44 EST


On Tue, 28 May 2024, Paul E. McKenney wrote:

> > > > This topic came up again when Paul E. McKenney noticed that
> > > > parts of the RCU code already rely on byte access and do not
> > > > work on alpha EV5 reliably, so I refreshed my series now for
> > > > inclusion into the next merge window.
> > >
> > > Hrrrm? That sounds like like Paul ran tests on EV5, did he?
> >
> > What exactly is required to make it work?
>
> Whatever changes are needed to prevent the data corruption that can
> currently result in code generated by single-byte stores. For but one
> example, consider a pair of tasks (or one task and an interrupt handler
> in the CONFIG_SMP=n case) do a single-byte store to a pair of bytes
> in the same machine word. As I understand it, in code generated for
> older Alphas, both "stores" will load the word containing that byte,
> update their own byte, and store the updated word.
>
> If two such single-byte stores run concurrently, one or the other of those
> two stores will be lost, as in overwritten by the other. This is a bug,
> even in kernels built for single-CPU systems. And a rare bug at that, one
> that tends to disappear as you add debug code in an attempt to find it.

Thank you for the detailed description of the problematic scenario.

I hope someone will find it useful, however for the record I have been
familiar with the intricacies of the Alpha architecture as well as their
implications for software for decades now. The Alpha port of Linux was
the first non-x86 Linux platform I have used and actually (and I've chased
that as a matter of interest) my first ever contribution to Linux was for
Alpha platform code:

On Mon, 30 Mar 1998, Jay.Estabrook@xxxxxxxxxxx wrote:

> Hi, sorry about the delay in answering, but you'll be happy to know, I took
> your patches and merged them into my latest SMP patches, and submitted them
> to Linus just last night. He promises them to (mostly) be in 2.1.92, so we
> can look forward to that... :-)

so I find the scenario you have described more than obvious.

Mind that the read-modify-write sequence that software does for sub-word
write accesses with original Alpha hardware is precisely what hardware
would have to do anyway and support for that was deliberately omitted by
the architecture designers from the ISA to give it performance advantages
quoted in the architecture manual. The only difference here is that with
hardware read-modify-write operations atomicity for sub-word accesses is
guaranteed by the ISA, however for software read-modify-write it has to be
explictly coded using the usual load-locked/store-conditional sequence in
a loop. I don't think it's a big deal really, it should be trivial to do
in the relevant accessors, along with the memory barriers that are needed
anyway for EV56+ and possibly other ports such as the MIPS one.

What I have been after actually is: can you point me at a piece of code
in our tree that will cause an issue with a non-BWX Alpha as described in
your scenario, so that I have a starting point? Mind that I'm completely
new to RCU as I didn't have a need to research it before (though from a
skim over Documentation/RCU/rcu.rst I understand what its objective is).

FWIW even if it was only me I think that depriving the already thin Alpha
port developer base of any quantity of the remaining manpower, by dropping
support for a subset of the hardware available, and then a subset that is
not just as exotic as the original i386 became to the x86 platform at the
time support for it was dropped, is only going to lead to further demise
and eventual drop of the entire port.

And I think it would be good if we kept the port, just as we keep other
ports of historical significance only, for educational reasons if nothing
else, such as to let people understand based on an actual example, once
mainstream, the implications of weakly ordered memory systems.

Maciej