Re: Linux kernel in-tree Rust support

From: Geoffrey Thomas
Date: Sun Aug 23 2020 - 17:54:57 EST

Next message: Rikard Falkeborn: "[PATCH 0/8] drivers: phy: Constify static phy_ops structs"
Previous message: Dmitry Osipenko: "Re: [PATCH 12/18] iommu/tegra-gart: Add IOMMU_DOMAIN_DMA support"
In reply to: Adrian Bunk: "Re: Linux kernel in-tree Rust support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, 24 Aug 2020, Adrian Bunk wrote:

In librsvg, breakages with more recent Rust versions in the past year
required updates of two vendored crates:

Interesting!

https://gitlab.gnome.org/GNOME/librsvg/-/commit/de26c4d8b192ed0224e6d38f54e429838608b902

Looks like this was, for a while, a warning and not an error:

https://github.com/servo/rust-cssparser/commit/3c98d22c5de3b696bf1fde2b6c90069812312aa6

= warning: this error has been downgraded to a warning for backwards compatibility with previous releases
= warning: this represents potential undefined behavior in your code and this warning will become a hard error in the future

https://gitlab.gnome.org/GNOME/librsvg/-/commit/696e4a6be2aeb00ea27945f94da066757431684d

Same here, and it looks like the same warning/error, too:

https://github.com/dimforge/nalgebra/issues/561

= warning: this error has been downgraded to a warning for backwards compatibility with previous releases
= warning: this represents potential undefined behavior in your code and this warning will become a hard error in the future

Following some links in that second issue, I see that this seems to be a summary of what's going on:

https://github.com/rust-lang/rust/issues/59159

in particular this paragraph: "at the time that NLL [non-lexical lifetimes] was stabilized, the compiler's acceptance of this borrowing pattern was categorized by the NLL team as a 'known bug'. The NLL team assumed that, as a bug fix, the compiler would be allowed to start rejecting the pattern in the future."

That is, both of these cases were code that should never have been accepted by the compiler because it is unsound, but the initial implementation of NLL was not clever enough to detect it. They later turned it into a warning, and later turned that into an error.

There is a link to this policy document which explains their process for breaking existing code in the case where it's necessary to fix a compiler bug or similar:

https://github.com/rust-lang/rfcs/blob/master/text/1589-rustc-bug-fix-procedure.md

There is also a link to this comment about why they decided to take this approach:

https://github.com/rust-lang/rust/pull/58739#issuecomment-476387184

which includes the followup task, "We do a retrospective and look to ways to alter our processes to better prevent this in the future." That is, it seems to me that the Rust team sees it as a mistake that they ended up in this situation.

Josh, do you know if that retrospective has happened? (I see some mumbling about NLL -> Polonius, can we be confident something similar won't happen with that? :) )

For updating Rust in Debian stable for the next Firefox ESR update it
would actually be useful if these violations of the "hard stability
guarantee" in Rust get fixed, so that the old librsvg 2.44.10 builds
again with the latest Rust.

It also makes me wonder how such regressions slip into Rust releases.

I do conceptually agree with this, even though it is not technically a "regression" (because it was really the old compiler that was buggy, not the new one). If I understand it right, neither of these lines of code were valid with the pre-NLL implementation either. They were only accepted by the initial NLL implementation. While the purpose of NLL was to enable writing (valid/sound) code that wouldn't have been accepted by the previous, simpler implementation of lifetimes, it seems like it should have been possible to opt out of it while there were "known bugs" precisely to prevent these sort of effective-regressions. (And I suspect that it was in fact possible to do so, but perhaps not documented clearly / perhaps there wasn't an awareness that this was a thing that users who deeply value stability would want to do.)

And yeah, I can see the value of a --accept-previously-accepted-buggy-code flag from the distros' point of view (even if I can also see why Rust upstream would not be super excited about it :) ). After all, if the choice is between the distro not taking _any_ bug fixes in rustc and the distro taking all but one, the latter seems like a better option.

The other takeaway, I think, is that you should regularly compile with warnings turned into hard errors. The policy document above (Rust RFC 1589) says that all such changes need to be a warning and not a hard error for at least one compiler reversion. That happened in this case, and both fixes were applied in the vendored crates while they were still warnings. For any kernel Rust use, I'd strongly advocate for running CI on every stable branch after each compiler release, preferably after each compiler beta release, and shaking out any warnings - even if they are not warnings the compiler will turn into errors on its own, they may still point to logic bugs in the code.

(The Rust folks have some automation named "crater" for running these sorts of tests across the Rust ecosystem, which the document mentions. I'd expect that anything shipped in major distros, including librsvg + its dependencies as well as any Rust in the kernel, should be included in crater.)

--
Geoffrey Thomas
https://ldpreload.com
geofft@xxxxxxxxxxxxx

Next message: Rikard Falkeborn: "[PATCH 0/8] drivers: phy: Constify static phy_ops structs"
Previous message: Dmitry Osipenko: "Re: [PATCH 12/18] iommu/tegra-gart: Add IOMMU_DOMAIN_DMA support"
In reply to: Adrian Bunk: "Re: Linux kernel in-tree Rust support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]