LINUX_VERSION_CODE overflow (was: Re: Linux 4.9.256)

From: Florian Weimer
Date: Thu Feb 11 2021 - 06:00:36 EST


* Greg Kroah-Hartman:

> I'm announcing the release of the 4.9.256 kernel.
>
> This, and the 4.4.256 release are a little bit "different" than normal.
>
> This contains only 1 patch, just the version bump from .255 to .256
> which ends up causing the userspace-visable LINUX_VERSION_CODE to
> behave a bit differently than normal due to the "overflow".
>
> With this release, KERNEL_VERSION(4, 9, 256) is the same as KERNEL_VERSION(4, 10, 0).
>
> Nothing in the kernel build itself breaks with this change, but given
> that this is a userspace visible change, and some crazy tools (like
> glibc and gcc) have logic that checks the kernel version for different
> reasons, I wanted to do this release as an "empty" release to ensure
> that everything still works properly.

As promised, I looked at this from the glibc perspective.

A dynamically linked glibc reads the LINUX_VERSION_CODE in the ELF note
in the vDSO.

Statically linked binaries use the uname system call and parse the
release field in struct utsname. If the uname system call fails, there
is also /proc fallback, but I believe that path is unused.

The glibc dynamic linker falls back to uname if the vDSO cannot be
located.

The LINUX_VERSION_CODE format is also used in /etc/ld.so.cache. This is
difficult to change because a newer ldconfig is supposed to build a
cache that is compatible with older glibc versions (two-way
compatibility). The information in /etc/ld.so.cache is copied from the
ELF_NOTE_ABI/NT_GNU_ABI_TAG ELF note in the DSOs; the note format is not
subject to overflows because it uses 32-bit values for the component
versions.

glibc uses the current kernel's LINUX_VERSION_CODE for two purposes: for
its own “kernel too old” check (glibc refuses to start in this case),
and to skip loading DSOs which have an ELF_NOTE_ABI/NT_GNU_ABI_TAG that
indicates a higher kernel version than the current kernel. glibc does
not use LINUX_VERSION_CODE to detect features or activate workarounds
for kernel bugs.

The overflow from 4.9.256 to 4.10.0 means that we might get spurious
passes on these checks. Worst case, it can happen that if the system
has a DSO in two versions on the library search path, one for kernel
4.10 and one for kernel 4.9 or earlier (in that order), we now load the
4.10 version on a 4.9 kernel. Previously, loading the 4.10 DSO failed,
and the fallback version for earlier kernels was used. That would be
real breakage.

Our options in userspace are limited because whatever changes we make to
glibc today are unlikely to reach people running 4.4 or 4.9 kernels
anytime soon, if ever. Clamping the sublevel field of
LINUX_VERSION_CODE in the vDSO to 255 only benefits dynamically linked
binaries, but it could be that this is sufficient to paper over this
issue.

There's also the question whether these glibc checks are valuable at
all. It encourages kernel patching to lie about kernel versions, making
diagnostics harder (e.g., reporting 3.10 if it's really a 2.6.32 with
lots of system call backports). The ELF_NOTE_ABI/NT_GNU_ABI_TAG DSO
selection is known to cause endless problems with Qt, basically the only
large-scale user of this feature. Perhaps we should remove it, but it
would also break the fallback DSO approach mentioned above.

Thanks,
Florian
--
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill