Re: [tip: core/urgent] compiler.h: Avoid the usage of __typeof_unqual__() when __GENKSYMS__ is defined

From: Uros Bizjak
Date: Sun Apr 13 2025 - 04:27:27 EST


On Sat, Apr 12, 2025 at 10:55 AM Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
>
> * Borislav Petkov <bp@xxxxxxxxx> wrote:
>
> > On Thu, Apr 10, 2025 at 10:58:46AM -0000, tip-bot2 for Uros Bizjak wrote:
> > > The following commit has been merged into the core/urgent branch of tip:
> > >
> > > Commit-ID: e696e5a114b59035f5a889d5484fedec4f40c1f3
> > > Gitweb: https://git.kernel.org/tip/e696e5a114b59035f5a889d5484fedec4f40c1f3
> > > Author: Uros Bizjak <ubizjak@xxxxxxxxx>
> > > AuthorDate: Fri, 04 Apr 2025 12:24:37 +02:00
> > > Committer: Borislav Petkov (AMD) <bp@xxxxxxxxx>
> > > CommitterDate: Thu, 10 Apr 2025 12:44:27 +02:00
> > >
> > > compiler.h: Avoid the usage of __typeof_unqual__() when __GENKSYMS__ is defined
> > >
> > > Current version of genksyms doesn't know anything about __typeof_unqual__()
> > > operator. Avoid the usage of __typeof_unqual__() with genksyms to prevent
> > > errors when symbols are versioned.
> > >
> > > There were no problems with gendwarfksyms.
> > >
> > > Fixes: ac053946f5c40 ("compiler.h: introduce TYPEOF_UNQUAL() macro")
> > > Closes: https://lore.kernel.org/lkml/81a25a60-de78-43fb-b56a-131151e1c035@xxxxxxxxxxxxx/
> > > Reported-by: Paul Menzel <pmenzel@xxxxxxxxxxxxx>
> > > Signed-off-by: Uros Bizjak <ubizjak@xxxxxxxxx>
> > > Signed-off-by: Borislav Petkov (AMD) <bp@xxxxxxxxx>
> > > Tested-by: Paul Menzel <pmenzel@xxxxxxxxxxxxx>
> > > Link: https://lore.kernel.org/r/20250404102535.705090-1-ubizjak@xxxxxxxxx
> > > ---
> > > include/linux/compiler.h | 6 +++---
> > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> > > index 27725f1..98057f9 100644
> > > --- a/include/linux/compiler.h
> > > +++ b/include/linux/compiler.h
> > > @@ -229,10 +229,10 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> > > /*
> > > * Use __typeof_unqual__() when available.
> > > *
> > > - * XXX: Remove test for __CHECKER__ once
> > > - * sparse learns about __typeof_unqual__().
> > > + * XXX: Remove test for __GENKSYMS__ once "genksyms" handles
> > > + * __typeof_unqual__(), and test for __CHECKER__ once "sparse" handles it.
> > > */
> > > -#if CC_HAS_TYPEOF_UNQUAL && !defined(__CHECKER__)
> > > +#if CC_HAS_TYPEOF_UNQUAL && !defined(__GENKSYMS__) && !defined(__CHECKER__)
> > > # define USE_TYPEOF_UNQUAL 1
> > > #endif
> >
> > So mingo is right - this is not really a fix but a brown-paper bag of
> > sorts.
>
> Yeah, agreed, I've removed this workaround from tip:core/urgent for the
> time being - it's not like genksyms is some magic external entity we
> have to work around, it's an in-kernel tool that can be fixed/enhanced
> in scripts/genksyms/.

Please note that you will disable a check that is finally able to fail
the build for a whole class of very subtle percpu bugs. This
functionality was discussed and specifically requested at [1] and [2]
by Thomas [CC'd]:

--q--
Coming back to __percpu. As I mentioned in the original thread it's a sad
state of affairs that the only way to detect the __percpu fails is sparse
or some other static checker:

https://lore.kernel.org/all/87bk7vuldh.ffs@tglx

But that's a different problem to solve and does not invalidate the fixes
which come with this series in any way.

If the compiler people would have provided a way to utilize the anyway
non-standard name space support in a useful way, I could have spared the
time to bang my head agaist the wall simply because this would have failed
to build in the first place long ago. That just makes me sad.

After wading through this, I really ask the 0-day people to push hard on
any sparse fallout which involves __percpu. The resulting failures can be
truly subtle and not necessarily fatal right away.
--/q--

and

--q--
I did not follow the __set_gs work closely, so I don't know whether Uros
ever tried to actually mark the per CPU variable __set_gs right away,
which would obviously catch the above 'foo' nonsense.

I think this should just work, but that would obviously require to do
the type cast magic at the EXPORT_SYMBOL side and in some other places.
--/q--

[1] https://lore.kernel.org/lkml/20240303235029.555787150@xxxxxxxxxxxxx/
[2] https://lore.kernel.org/lkml/87edcruvja.ffs@tglx/

My proposed "workaround" patch (quoted above) will fallback to
generate a preprocessed source that both, sparse and genksyms
understand *without any loss of functionality*. This exact source is
what these tools currently process, and as shown in the message trail,
the proposed patch fully restores the existing functionality.
Effectively, genksyms would generate the same CRC if it would match
typeof_unqual keyword with typeof and ignore __seg_gs keyword.

The effectiveness of the check can be seen in the pull request for MM
updates [3]:

--q--
- The 6 patch series "Enable strict percpu address space checks" from
Uros Bizjak uses x86 named address space qualifiers to provide
compile-time checking of percpu area accesses.

This has caused a small amount of fallout - two or three issues were
reported. In all cases the calling[sic] code was founf[sic] to be incorrect.
--/q--

[3] https://lore.kernel.org/lkml/20250330165732.f4c1493615375623f67e38eb@xxxxxxxxxxxxxxxxxxxx/

The patch that disables the above checks trades enablement of this new
functionality for the request that genksyms (and sparse?) fully
support the new keyword first. A workaround was promptly developed
that allows the external tools to work without any loss of
functionality, and allows these tools to be independently improved at
any later time. Please also note that genksyms is already partially
obsoleted by and substituted in allyesconfig by gendwarfksyms, this
was the reason the issue was found only at rc1 time.

Also, I don't think that the issue falls into the "brown-paper bag"
bug category at all. The unfixed source results in a minor loss of
functionality (percpu symbols are not versioned), was detected at rc1
and promptly fixed (worked around). The issue can't be detected by any
standardized approach (allyesconfig compile), and for the core change
of this magnitude, some fall-out is to be expected during integration
(rc1) time. FYI - more than 30 patches were needed to fix percpu
issues in the kernel source (many of these issues were not found by
sparse) before checks were enabled and the enablement was already
postponed for one release to fix an issue, triggered by some extremely
dubious usage by external tools (rust).

Based on the above facts, I'd like to ask maintainers to reconsider
their decision.

Uros.