Re: [PATCH 4/4] x86/percpu: Use C for percpu read/write accessors

From: Uros Bizjak
Date: Sun Oct 08 2023 - 15:18:21 EST


On Sun, Oct 8, 2023 at 8:00 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, 4 Oct 2023 at 07:51, Uros Bizjak <ubizjak@xxxxxxxxx> wrote:
> >
> > The percpu code mostly uses inline assembly. Using segment qualifiers
> > allows to use C code instead, which enables the compiler to perform
> > various optimizations (e.g. propagation of memory arguments). Convert
> > percpu read and write accessors to C code, so the memory argument can
> > be propagated to the instruction that uses this argument.
>
> So apparently this causes boot failures.
>
> It might be worth testing a version where this:
>
> > +#define raw_cpu_read_1(pcp) __raw_cpu_read(, pcp)
> > +#define raw_cpu_read_2(pcp) __raw_cpu_read(, pcp)
> > +#define raw_cpu_read_4(pcp) __raw_cpu_read(, pcp)
> > +#define raw_cpu_write_1(pcp, val) __raw_cpu_write(, pcp, val)
> > +#define raw_cpu_write_2(pcp, val) __raw_cpu_write(, pcp, val)
> > +#define raw_cpu_write_4(pcp, val) __raw_cpu_write(, pcp, val)
>
> and this
>
> > +#ifdef CONFIG_X86_64
> > +#define raw_cpu_read_8(pcp) __raw_cpu_read(, pcp)
> > +#define raw_cpu_write_8(pcp, val) __raw_cpu_write(, pcp, val)
>
> was all using 'volatile' in the qualifier argument and see if that
> makes the boot failure go away.
>
> Because while the old code wasn't "asm volatile", even just a *plain*
> asm() is certainly a lot more serialized than a normal access.
>
> For example, the asm() version of raw_cpu_write() used "+m" for the
> destination modifier, which means that if you did multiple percpu
> writes to the same variable, gcc would output multiple asm calls,
> because it would see the subsequent ones as reading the old value
> (even if they don't *actually* do so).
>
> That's admittedly really just because it uses a common macro for
> raw_cpu_write() and the updates (like the percpu_add() code), so the
> fact that it uses "+m" instead of "=m" is just a random odd artifact
> of the inline asm version, but maybe we have code that ends up working
> just by accident.
>
> Also, I'm not sure gcc re-orders asms wrt each other - even when they
> aren't marked volatile.
>
> So it might be worth at least a trivial "make everything volatile"
> test to see if that affects anything.

I have managed to reproduce the bug, and I'm trying the following path:

Scrap the last patch and just add:

#define __raw_cpu_read_new(size, qual, pcp) \
({ \
*(qual __my_cpu_type(pcp) *)__my_cpu_ptr(&(pcp)); \
})

#define __raw_cpu_read(size, qual, _var) \
({ \
__pcpu_type_##size pfo_val__; \
asm qual (__pcpu_op2_##size("mov", __percpu_arg([var]), "%[val]") \
: [val] __pcpu_reg_##size("=", pfo_val__) \
: [var] "m" (__my_cpu_var(_var))); \
(typeof(_var))(unsigned long) pfo_val__; \
})

Then changing *only*

#define raw_cpu_read_8(pcp) __raw_cpu_read_new(8, , pcp)

brings the boot process far enough to report:

[ 0.463711][ T0] Dentry cache hash table entries: 1048576
(order: 11, 8388608 bytes, linear)
[ 0.465859][ T0] Inode-cache hash table entries: 524288 (order:
10, 4194304 bytes, linear)
PANIC: early exception 0x0d IP 10:ffffffff810c4cb9 error 0 cr2
0xffff8881ab1ff000
[ 0.469084][ T0] CPU: 0 PID: 0 Comm: swapper Not tainted
6.5.0-11417-gca4256348660-dirty #7
[ 0.470756][ T0] RIP: 0010:cpu_init_exception_handling+0x179/0x740
[ 0.472045][ T0] Code: be 0f 00 00 00 4a 03 04 ed 40 19 15 85 48
89 c7 e8 9c bb ff ff 48 c7 c0 10 73 02 00 48 ba 00 00 00 00 00 fc ff
df 48 c1 e8 03 <80> 3c 10
00 0f 85 21 05 00 00 65 48 8b 05 45 26 f6 7e 48 8d 7b 24
[ 0.475784][ T0] RSP: 0000:ffffffff85207e38 EFLAGS: 00010002
ORIG_RAX: 0000000000000000
[ 0.477384][ T0] RAX: 0000000000004e62 RBX: ffff88817700a000
RCX: 0000000000000010
[ 0.479093][ T0] RDX: dffffc0000000000 RSI: ffffffff85207e60
RDI: ffff88817700f078
[ 0.481178][ T0] RBP: 000000000000f000 R08: 0040f50000000000
R09: 0040f50000000000
[ 0.482655][ T0] R10: ffff8881ab02a000 R11: 0000000000000000
R12: 1ffffffff0a40fc8
[ 0.484128][ T0] R13: 0000000000000000 R14: 0000000000000000
R15: ffffffff85151940
[ 0.485604][ T0] FS: 0000000000000000(0000)
GS:ffff888177000000(0000) knlGS:0000000000000000
[ 0.487246][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.488515][ T0] CR2: ffff8881ab1ff000 CR3: 00000000052d7000
CR4: 00000000000000b0
[ 0.490002][ T0] Call Trace:
[ 0.490600][ T0] <TASK>
[ 0.491145][ T0] ? early_fixup_exception+0x10e/0x280
[ 0.492176][ T0] ? early_idt_handler_common+0x2f/0x40
[ 0.493222][ T0] ? cpu_init_exception_handling+0x179/0x740
[ 0.494348][ T0] ? cpu_init_exception_handling+0x164/0x740
[ 0.495472][ T0] ? syscall_init+0x1c0/0x1c0
[ 0.496351][ T0] ? per_cpu_ptr_to_phys+0x1ca/0x2c0
[ 0.497336][ T0] ? setup_cpu_entry_areas+0x138/0x980
[ 0.498365][ T0] trap_init+0xa/0x40

Let me see what happens here. I have changed *only* raw_cpu_read_8,
but the GP fault is reported in cpu_init_exception_handling, which
uses this_cpu_ptr. Please note that all per-cpu initializations go
through existing {this|raw}_cpu_write.

void cpu_init_exception_handling(void)
{
struct tss_struct *tss = this_cpu_ptr(&cpu_tss_rw);
int cpu = raw_smp_processor_id();
...

I have tried the trick with all reads volatile (and writes as they
were before the patch), but it didn't make a difference. Also, the
kernel from the git/tip branch works OK for default config, so I think
there is some config option that somehow disturbs the named-as enabled
kernel.

Uros.