Re: C aggregate passing (Rust kernel policy)

From: Linus Torvalds
Date: Sat Feb 22 2025 - 16:47:06 EST

Next message: kernel test robot: "include/linux/thread_info.h:259:25: error: call to '__bad_copy_to' declared with attribute error: copy destination size is too small"
Previous message: Daniel Gomez: "Re: [PATCH 2/2] moderr: add module error injection tool"
In reply to: Kent Overstreet: "Re: C aggregate passing (Rust kernel policy)"
Next in thread: Kent Overstreet: "Re: C aggregate passing (Rust kernel policy)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sat, 22 Feb 2025 at 13:22, Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote:
>
> Power hungry and prone to information leaks, though.

The power argument is bogus.

The fact is, high performance is <i>always</i> "inefficient". Anybody
who doesn't understand that doesn't understand reality.

And I very much say "reality". Because it has nothing to do with CPU
design, and everything to do with "that is how reality is".

Look at biology. Look at absolutely <i>any</i> other area of
technology. Are you a car nut? Performance cars are not efficient.

Efficiency comes at a very real cost in performance. It's basically a
fundamental rule of entropy, but if you want to call it anything else,
you can attribute it to me.

Being a high-performance warm-blooded mammal takes a lot of energy,
but only a complete nincompoop then takes that as a negative. You'd be
*ignorant* and stupid to make that argument.

But somehow when it comes to technology, people _do_ make that
argument, and other people take those clowns seriously. It boggles the
mind.

Being a snake is a _hell_ of a lot more "efficient". You might only
need to eat once a month. But you have to face the reality that that
particular form of efficiency comes at a very real cost, and saying
that being "cold-blooded" is more efficient than being a warm-blooded
mammal is in many ways a complete lie and is distorting the truth.

It's only more efficient within the narrow band where it works, and
only if you are willing to take the very real costs that come with it.

If you need performance in the general case, it's not at all more
efficient any more: it's dead.

Yes, good OoO takes power. But I claim - and history backs me up -
that it does so by outperforming the alternatives.

The people who try to claim anything else are deluded and wrong, and
are making arguments based on fever dreams and hopes and rose-tinted
glasses.

It wasn't all that long ago that the ARM people claimed that their
in-order cores were better because they were lower power and more
efficient. Guess what? When they needed higher performance, those
delusions stopped, and they don't make those stupid and ignorant
arguments any more. They still try to mumble about "little" cores, but
if you look at the undisputed industry leader in ARM cores (hint: it
starts with an 'A' and sounds like a fruit), even the "little" cores
are OoO.

The VLIW people have proclaimed the same efficiency advantages for
decades. I know. I was there (with Peter ;), and we tried. We were
very very wrong.

At some point you just have to face reality.

The vogue thing now is to talk about explicit parallelism, and just
taking lots of those lower-performance (but thus more "efficient" -
not really: they are just targeting a different performance envelope)
cores perform as well as OoO cores.

And that's _lovely_ if your load is actually that parallel and you
don't need a power-hungry cross-bar to make them all communicate very
closely.

So if you're a GPU - or, as we call them now: AI accelerators - you'd
be stupid to do anything else.

Don't believe the VLIW hype. It's literally the snake of the CPU
world: it can be great in particular niches, but it's not some "answer
to efficiency". Keep it in your DSP's, and make your GPU's use a
metric shit-load of them, but don't think that being good at one thing
makes you somehow the solution in the general purpose computing model.

It's not like VLIW hasn't been around for many decades. And there's a
reason you don't see it in GP CPUs.

Linus

Next message: kernel test robot: "include/linux/thread_info.h:259:25: error: call to '__bad_copy_to' declared with attribute error: copy destination size is too small"
Previous message: Daniel Gomez: "Re: [PATCH 2/2] moderr: add module error injection tool"
In reply to: Kent Overstreet: "Re: C aggregate passing (Rust kernel policy)"
Next in thread: Kent Overstreet: "Re: C aggregate passing (Rust kernel policy)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]