Re: [PATCH 2/7] x86/percpu: Clean up percpu_to_op()

From: Brian Gerst
Date: Thu May 21 2020 - 09:06:29 EST


On Wed, May 20, 2020 at 1:26 PM Nick Desaulniers
<ndesaulniers@xxxxxxxxxx> wrote:
>
> On Mon, May 18, 2020 at 8:38 PM Brian Gerst <brgerst@xxxxxxxxx> wrote:
> >
> > On Mon, May 18, 2020 at 5:15 PM Nick Desaulniers
> > <ndesaulniers@xxxxxxxxxx> wrote:
> > >
> > > On Sun, May 17, 2020 at 8:29 AM Brian Gerst <brgerst@xxxxxxxxx> wrote:
> > > >
> > > > The core percpu macros already have a switch on the data size, so the switch
> > > > in the x86 code is redundant and produces more dead code.
> > > >
> > > > Also use appropriate types for the width of the instructions. This avoids
> > > > errors when compiling with Clang.
> > > >
> > > > Signed-off-by: Brian Gerst <brgerst@xxxxxxxxx>
> > > > ---
> > > > arch/x86/include/asm/percpu.h | 90 ++++++++++++++---------------------
> > > > 1 file changed, 35 insertions(+), 55 deletions(-)
> > > >
> > > > diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> > > > index 89f918a3e99b..233c7a78d1a6 100644
> > > > --- a/arch/x86/include/asm/percpu.h
> > > > +++ b/arch/x86/include/asm/percpu.h
> > > > @@ -117,37 +117,17 @@ extern void __bad_percpu_size(void);
> > > > #define __pcpu_reg_imm_4(x) "ri" (x)
> > > > #define __pcpu_reg_imm_8(x) "re" (x)
> > > >
> > > > -#define percpu_to_op(qual, op, var, val) \
> > > > -do { \
> > > > - typedef typeof(var) pto_T__; \
> > > > - if (0) { \
> > > > - pto_T__ pto_tmp__; \
> > > > - pto_tmp__ = (val); \
> > > > - (void)pto_tmp__; \
> > > > - } \
> > > > - switch (sizeof(var)) { \
> > > > - case 1: \
> > > > - asm qual (op "b %1,"__percpu_arg(0) \
> > > > - : "+m" (var) \
> > > > - : "qi" ((pto_T__)(val))); \
> > > > - break; \
> > > > - case 2: \
> > > > - asm qual (op "w %1,"__percpu_arg(0) \
> > > > - : "+m" (var) \
> > > > - : "ri" ((pto_T__)(val))); \
> > > > - break; \
> > > > - case 4: \
> > > > - asm qual (op "l %1,"__percpu_arg(0) \
> > > > - : "+m" (var) \
> > > > - : "ri" ((pto_T__)(val))); \
> > > > - break; \
> > > > - case 8: \
> > > > - asm qual (op "q %1,"__percpu_arg(0) \
> > > > - : "+m" (var) \
> > > > - : "re" ((pto_T__)(val))); \
> > > > - break; \
> > > > - default: __bad_percpu_size(); \
> > > > - } \
> > > > +#define percpu_to_op(size, qual, op, _var, _val) \
> > > > +do { \
> > > > + __pcpu_type_##size pto_val__ = __pcpu_cast_##size(_val); \
> > > > + if (0) { \
> > > > + typeof(_var) pto_tmp__; \
> > > > + pto_tmp__ = (_val); \
> > > > + (void)pto_tmp__; \
> > > > + } \
> > >
> > > Please replace the whole `if (0)` block with:
> > > ```c
> > > __same_type(_var, _val);
> > > ```
> > > from include/linux/compiler.h.
> >
> > The problem with __builtin_types_compatible_p() is that it considers
> > unsigned long and u64 (aka unsigned long long) as different types even
> > though they are the same width on x86-64. While this may be a good
> > cleanup to look at in the future, it's not a simple drop-in
> > replacement.
>
> Does it trigger errors in this case?

Yes, see boot_init_stack_canary(). That code looks a bit sketchy but
it's not wrong, for x86-64 at least.

It also doesn't seem to like "void *" compared to any other pointer type:

In function âfpregs_deactivateâ,
inlined from âfpu__dropâ at arch/x86/kernel/fpu/core.c:285:3:
./include/linux/compiler.h:379:38: error: call to
â__compiletime_assert_317â declared with attribute error: BUILD_BUG_ON
failed: !__same_type((fpu_fpregs_owner_ctx), ((void *)0))
379 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^
./include/linux/compiler.h:360:4: note: in definition of macro
â__compiletime_assertâ
360 | prefix ## suffix(); \
| ^~~~~~
./include/linux/compiler.h:379:2: note: in expansion of macro
â_compiletime_assertâ
379 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:39:37: note: in expansion of macro
âcompiletime_assertâ
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:50:2: note: in expansion of macro âBUILD_BUG_ON_MSGâ
50 | BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
| ^~~~~~~~~~~~~~~~
./arch/x86/include/asm/percpu.h:105:2: note: in expansion of macro
âBUILD_BUG_ONâ
105 | BUILD_BUG_ON(!__same_type(_var, _val)); \
| ^~~~~~~~~~~~
./arch/x86/include/asm/percpu.h:338:37: note: in expansion of macro
âpercpu_to_opâ
338 | #define this_cpu_write_8(pcp, val) percpu_to_op(8, volatile,
"mov", (pcp), val)
| ^~~~~~~~~~~~
./include/linux/percpu-defs.h:380:11: note: in expansion of macro
âthis_cpu_write_8â
380 | case 8: stem##8(variable, __VA_ARGS__);break; \
| ^~~~
./include/linux/percpu-defs.h:508:34: note: in expansion of macro
â__pcpu_size_callâ
508 | #define this_cpu_write(pcp, val)
__pcpu_size_call(this_cpu_write_, pcp, val)
| ^~~~~~~~~~~~~~~~
./arch/x86/include/asm/fpu/internal.h:525:2: note: in expansion of
macro âthis_cpu_writeâ
525 | this_cpu_write(fpu_fpregs_owner_ctx, NULL);
| ^~~~~~~~~~~~~~

>
> It's interesting to know how this trick differs from
> __builtin_types_compatible_p(). Might even be helpful to wrap this
> pattern in a macro with a comment with the pros/cons of this approach
> vs __same_type.

I think the original code is more to catch a mismatch between pointers
and integers. It doesn't seem to care about truncation

> On the other hand, the use of `long` seems tricky in x86 code as x86
> (32b) is ILP32 but x86_64 (64b) is LP64. So the use of `long` is
> ambiguous in the sense that it's a different size depending on the
> target ABI. Wouldn't it potentially be a bug for x86 kernel code to
> use `long` percpu variables (or rather mix `long` and `long long` in
> the same operation) in that case, since the sizes of the two would be
> different for i386?

Not necessarily. Some things like registers are naturally 32-bit on a
32-bit kernel and 64-bit on a 64-bit kernel, so 'long' is appropriate
there.

--
Brian Gerst