Re: [PATCH 00/14] alpha: cleanups for 6.10
From: Linus Torvalds
Date: Thu May 30 2024 - 20:11:30 EST
On Thu, 30 May 2024 at 15:57, Maciej W. Rozycki <macro@xxxxxxxxxxx> wrote:
>
> On Wed, 29 May 2024, Linus Torvalds wrote:
> >
> > The 21064 actually did atomicity with an external pin on the bus, the
> > same way people used to do before caches even existed.
>
> Umm, 8086's LOCK#, anyone?
Well, yes and no.
So yes, exactly like 8086 did before having caches.
But no, not like the alpha contemporary PPro that did have caches. The
PPro already did locked cycles in the caches.
Yes, the PPro still did have an external lock pin (and in fact current
much more modern x86 CPUs do too), but it's only used for locked IO
accesses or possibly cacheline crossing accesses.
So x86 has supported atomic accesses on IO - and it is very very slow,
to this day. So slow, and problematic, in fact, that Intel is only now
trying to remove it (look up "split lock"
But the 21064 explicitly did not support locking on IO - and unaligned
LL/SC accesses obviously also did not work.
So I really feel the 21064 was broken.
It's probably related to the whole cache coherency being designed to
be external to the built-in caches - or even the Bcache. The caches
basically are write-through, and the weak memory ordering was designed
for allowing this horrible model.
> > In fact, it's worse than "not thread safe". It's not even safe on UP
> > with interrupts, or even signals in user space.
>
> Ouch, I find it a surprising oversight.
The sad part is that it doesn't seem to have been an oversight. It
really was broken-as-designed.
Basically, the CPU was designed for single-threaded Spec benchmarks
and absolutely nothing else. Classic RISC where you recompile to fix
problems like the atomicity thing - "just use a 32-bit sig_atomic_t
and you're fine")
The original alpha architecture handbook makes a big deal of how
clever the lack of byte and word operations is. I also remember
reading an article by Dick Sites - one of the main designers - talking
a lot about how the lack of byte operations is great, and encourages
vectorizing byte accesses and doing string operations in whole words.
Linus