Re: [WIP 0/3] Memory model and atomic API in Rust

From: Kent Overstreet
Date: Mon Mar 25 2024 - 23:28:59 EST


On Tue, Mar 26, 2024 at 01:35:46AM +0000, Dr. David Alan Gilbert wrote:
> OK, so that's essentially the opposite worry of what I was saying; I was
> worrying about people forgetting to use an atomic access to a shared
> variable; I think you're worrying about people forgetting to mark
> a variable shared and since the accesses are the same nothing shouts?

In biological evolution, novel useful traits are generally not
accessible via a single mutation; many neutral mutations are required
first.

Evolution is able to proceed quickly because there are a great many
neutral mutations (that is, evolution quickly searches all possible
paths to find accessible positive mutations), and because negative
mutations are culled quickly - often before the first cell division.

(The most common mutation being the addition or deletion of a base pair;
but amino acids are coded for by groups of three base pairs, so that
shifts everything on the chromosone after the mutation so that it codes
for completely different amino acids. That cell won't live to divide
again).

Actual genetic diseases that significantly impair fitness are quite
rare, and if they weren't we'd have a major problem.

Programming at scale is million monkeys stuff - we're all hammering on
our keyboards at random, the good programs survive and the bad programs
are forgotten.

Similarly to biological evolution, we want most edits to a program to
result in a program that either still works, or fails immediately -
fails to compile, or is caught immediately by basic testing.

If edits can result in latent undefined behaviour or programs that
_mostly_ work, and then explode in unpredictable ways weeks/months/years
later - that's a huge problem. In the worst case, those bugs/negative
mutations accumulate faster than they can be culled.

Thank god we have source control.

Places where we're working with extremely loose synchronization - no
locking, raw memory barriers - are the worst kind of hand grenade, they
result in bugs that are impossible to cull quickly.

In kernel programming, we're always walking around with live hand
grenades.

So what do we do?

We slow down, we take every step slowly and intentionally while telling
everyone not to bump us because we're holding a live hand grenade - raw
atomics, raw unlocked variables, memory barriers, they all get special
care and extra comments.

And we all have fuck-tons of code we need to be able to understand,
review, debug and maintain, so we always try to write our code in a
style where the if it's wrong, we'll see that _locally_, without having
to go through and remember how _everything_ out of the possibly
thousands of relevant lines work.

I'm personally responsible for over 100k LOC of highly intricate code
with high consequences for failure, and regularly have to debug issues
arising somewhere in north of a million LOC - and when something goes
wrong I have to be able to fully debug it _quickly_.

What C++ does is like taking those hand grenades, with the pin already
out - and leaving one under the couch cushions, another in the
silverware drawer, another in the laundry basket - and expecting you to
remember where you put them.

Going back to the C++ example, the really fatal thing with how they do
it is how a change in one line of code can completely change the
semantics of a bunch of different code, and no human reviewer can be
expected to catch bugs that might introduce and the compiler certainly
won't.

Now imagine multiple people working on the same code, at different
times.

Now imagine patches getting mixed up, reordered, one of them getting
lost, merge conflicts - i.e. shit that happens all the time, and what
happens if you're using C++ style atomics.

Terrifying stuff.