Re: C aggregate passing (Rust kernel policy)

From: Ventura Jack
Date: Thu Mar 06 2025 - 13:49:27 EST

Next message: Mike Rapoport: "[PATCH 00/13] arch, mm: reduce code duplication in mem_init()"
Previous message: linux: "[PATCH net-next] net: phylink: Remove unused phylink_init_eee"
In reply to: Ralf Jung: "Re: C aggregate passing (Rust kernel policy)"
Next in thread: Ventura Jack: "Re: C aggregate passing (Rust kernel policy)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Mar 4, 2025 at 11:24 AM Ralf Jung <post@xxxxxxxx> wrote:
>
> Hi all,
>
> >>> The time crate breaking example above does not
> >>> seem nice.
> >>
> >> The time issue is like the biggest such issue we had ever, and indeed that did
> >> not go well. We should have given the ecosystem more time to update to newer
> >> versions of the time crate, which would have largely mitigated the impact of
> >> this. A mistake was made, and a *lot* of internal discussion followed to
> >> minimize the chance of this happening again. I hope you don't take that accident
> >> as being representative of regular Rust development.
> >
> > Was it an accident? I thought the breakage was intentional,
> > and in line with Rust's guarantees on backwards
> > compatibility, since it was related to type inference,
> > and Rust is allowed to do breaking changes for that
> > according to its guarantees as I understand it.
> > Or do you mean that it was an accident that better
> > mitigation was not done in advance, like you describe
> > with giving the ecosystem more time to update?
>
> It was an accident. We have an established process for making such changes while
> keeping the ecosystem impact to a minimum, but mistakes were made and so the
> ecosystem impact was beyond what we'd be willing to accept.
>
> The key to understand here that there's a big difference between "we do a
> breaking change but hardly anyone notices" and "we do a breaking change and
> everyone hears about it". The accident wasn't that some code broke, the accident
> was that so much code broke. As you say, we have minor breaking changes fairly
> regularly, and yet all the examples you presented of people being upset were
> from this one case where we screwed up. I think that shows that generally, the
> process works: we can do minor breaking changes without disrupting the
> ecosystem, and we can generally predict pretty well whether a change will
> disrupt the ecosystem. (In this case, we actually got the prediction and it was
> right! It predicted significant ecosystem breakage. But then diffusion of
> responsibility happened and nobody acted on that data.)
>
> And yes, *technically* that change was permitted as there's an exception in the
> stability RFC for such type ambiguity changes. However, we're not trying to be
> "technically right", we're trying to do the right thing for the ecosystem, and
> the way this went, we clearly didn't do the right thing. If we had just waited
> another 3 or 4 Rust releases before rolling out this change, the impact would
> have been a lot smaller, and you likely would never have heard about this.
>
> (I'm saying "we" here since I am, to an extent, representing the Rust project in
> this discussion. I can't actually speak for the Rust project, so these opinions
> are my own. I also was not involved in any part of the "time" debacle.)

These comments claim that other things went wrong as well as
I understand it.

https://internals.rust-lang.org/t/type-inference-breakage-in-1-80-has-not-been-handled-well/21374

"There has been no public communication about this.
There were no future-incompat warnings. The affected
crates weren't yanked. There wasn't even a blog post
announcing the problem ahead of time and urging users
to update the affected dependency. Even the 1.80 release
announcement didn't say a word about the incompatibility
with one of the most used Rust crates."

https://internals.rust-lang.org/t/type-inference-breakage-in-1-80-has-not-been-handled-well/21374/9

"Why yank?

These crates no longer work on any supported Rust version
(which is 1.80, because the Rust project doesn't support past
versions). They're permanently defunct.

It makes Cargo alert users of the affected versions that
there's a problem with them.

It prevents new users from locking to the broken versions.

and if yanking of them seems like a too drastic measure
or done too soon, then breaking them was also done too
hard too soon."

And the time crate issue happened less than a year ago.

One thing that confuses me is that a previous issue, said to
be similar to the time crate issue, was rejected in 2020, and
then some were considering in 2024 to do that one as well
despite it possibly having similar breakage.

https://internals.rust-lang.org/t/type-inference-breakage-in-1-80-has-not-been-handled-well/21374/19

"On the other hand, @dtolnay, who objected to
impl AsRef for Cow<'_, str> on the grounds of
type inference breakage, announced that the libs
team explictly decided to break time's type inference,
which is inconsistent. But if this was deliberate and
deemed a good outcome, perhaps that AsRef impl
should be reconsidered, after all?"

https://github.com/rust-lang/rust/pull/73390

There have been other issues as well. I searched through.

https://github.com/rust-lang/rust/issues?q=label%3A%22regression-from-stable-to-stable%22%20sort%3Acomments-desc%20

"Stable to stable regression", and a number of issues show up.
Most of these do not seem to be intentional breakage, to be fair.
Some of the issues that are relatively more recent, as in from
2020 and later, include.

https://github.com/rust-lang/rust/issues/89195

"Compilation appears to loop indefinitely"

https://github.com/tokio-rs/axum/issues/200#issuecomment-948888360

"I ran into the same problem of extremely slow
compile times on 1.56, both sondr3/cv-aas and
sondr3/web take forever to compile."

This one started as a nightly regression, but was changed
to "stable to stable regression".

https://github.com/rust-lang/rust/issues/89601

"nightly-2021-09-03: Compiler hang in project with a
lot of axum crate routes"

This one is from 2023, still open, though it may have been
solved or mitigated later for some cases.

https://github.com/rust-lang/rust/issues/115283

"Upgrade from 1.71 to 1.72 has made compilation
time of my async-heavy actix server 350 times
slower (from under 5s to 30 minutes, on a 32GB M1
Max CPU)."

This one is from 2020, still open, though with mitigation
and fixes for some cases as I understand it. 35 thumbs up.

https://github.com/rust-lang/rust/issues/75992

"I upgraded from 1.45 to 1.46 today and a crate
I'm working on seems to hang forever while compiling."

Some of the issues may be related to holes in the
type system, and therefore may be fundamentally
difficult to fix. I can imagine that there might be
some examples that are similar for C++ projects,
but C++ has a less advanced type system than Rust,
with no advanced solver, so I would guess that there
are fewer such examples for C++. And a project
can switch to a different C++ compiler. Hopefully
gccrs will be ready in the near future such that
Rust projects can do similar switching. Though as I
understand it, a lot of the type checking
implementation will be shared between rustc and
gccrs. For C, the language should be so simple that
these kinds of issues are very rare or never occurs.

> > Another concern I have is with Rust editions. It is
> > a well defined way of having language "versions",
> > and it does have automated conversion tools,
> > and Rust libraries choose themselves which
> > edition of Rust that they are using, independent
> > of the version of the compiler.
> >
> > However, there are still some significant changes
> > to the language between editions, and that means
> > that to determine the correctness of Rust code, you
> > must know which edition it is written for.
>
> There exist corner cases where that is true, yes. They are quite rare. Congrats
> on finding one! But you hardly ever see such examples in practice. As above,
> it's important to think of these things quantitatively, not qualitatively.

What do you mean "congrats"?

I think that one should consider both "quantitatively"
and also "qualitatively".

I do not know how rare they are. One can go through the changes
in the Rust editions guide and look at them. A few more I found.
I should stress that these issues have automated upgrading or
lints for them. For some of the Rust editions changes, there is
no automated upgrade tools, only lint tools.

https://doc.rust-lang.org/edition-guide/rust-2021/disjoint-capture-in-closures.html

"Changing the variables captured by a closure
can cause programs to change behavior or to stop
compiling in two cases:

changes to drop order, or when destructors run (details);

changes to which traits a closure implements (details)."

https://doc.rust-lang.org/edition-guide/rust-2024/never-type-fallback.html

"In some cases your code might depend on the
fallback type being (), so this can cause compilation
errors or changes in behavior."

I am not sure whether this has changed behavior
between editions.

https://doc.rust-lang.org/edition-guide/rust-2024/rpit-lifetime-capture.html

"Without this use<> bound, in Rust 2024, the
opaque type would capture the 'a lifetime
parameter. By adding this bound, the migration
lint preserves the existing semantics."

As far as I can tell, there are more changes in the
Rust 2024 edition than in the previous editions.
Will future Rust editions, like Rust edition 2027,
have even more changes, including more with
semantic changes?

One way to avoid some of the issues with having
to understand and keep in mind the semantic
differences between Rust editions, might be
to always upgrade a Rust project to the most
recent Rust edition, before attempting to do
maintenance or development on that project.
But upgrading to the next Rust edition might
be a fair bit of work in some cases, and require
understanding the semantic differences
between editions in some cases. Especially when
macros are involved, as I understand it. The
migration guides often have a number of steps
involved, and the migration may sometimes be
so complex that the migration is done gradually.
This guide said that upgrading from 2021 to
2024 was not a lot of work for a specific project
as I understand it, but it was still done gradually.

https://codeandbitters.com/rust-2024-upgrade/

Learning materials and documentation might also
need to be updated.

I really hope that Rust edition 2027 will have fewer,
not more, semantic changes. Rust edition 2024
seems to me to have had more semantic changes
compared to previous editions.

If the Linux kernel had 1 million LOC of Rust, and
it was desired to upgrade to a new edition, how
might that look like? Or, would the kernel just let
different Rust codebases have different editions?
Rust does enable Rust crates with different
editions to interact, as I understand it, but
at the very least, one would have to be careful
with remembering what edition one is working
in, and what the semantics are for that edition.

Does upgrading to a new edition potentially
require understanding a specific project,
or can it always be done without knowing or
understanding the specific codebase?
There are not always automated tools available
for upgrading, sometimes only lints are
available, as I understand it. Would upgrading
a Linux kernel driver written in Rust to a new
edition require understanding that driver?
If yes, it might be easier to let drivers stay
on older Rust editions in some cases.

Best, VJ.

Next message: Mike Rapoport: "[PATCH 00/13] arch, mm: reduce code duplication in mem_init()"
Previous message: linux: "[PATCH net-next] net: phylink: Remove unused phylink_init_eee"
In reply to: Ralf Jung: "Re: C aggregate passing (Rust kernel policy)"
Next in thread: Ventura Jack: "Re: C aggregate passing (Rust kernel policy)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]