Re: Regression on ARMs in next-20170531

From: Russell King - ARM Linux
Date: Wed May 31 2017 - 13:45:00 EST


On Wed, May 31, 2017 at 09:45:45AM -0700, Tony Lindgren wrote:
> Mark Brown noticed that the so far the only booting
> ARMs are all with CONFIG_SMP disabled and I just
> confirmed that's the case.

> 8< --------------------
> Unable to handle kernel paging request at virtual address 2e116007
> pgd = c0004000
> [2e116007] *pgd=00000000
> Internal error: Oops: 5 [#1] SMP ARM
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #200
> Hardware name: Generic DRA74X (Flattened Device Tree)
> task: c0d0adc0 task.stack: c0d00000
> PC is at __mod_node_page_state+0x2c/0xc8
> LR is at __per_cpu_offset+0x0/0x8
> pc : [<c0271de8>] lr : [<c0d07da4>] psr: 600000d3
> sp : c0d01eec ip : 00000000 fp : c15782f4
> r10: 00000000 r9 : c1591280 r8 : 00004000
> r7 : 00000001 r6 : 00000006 r5 : 2e116000 r4 : 00000007
> r3 : 00000007 r2 : 00000001 r1 : 00000006 r0 : c0dc27c0
> Flags: nZCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment none
...
> Code: e79e5103 e28c3001 e0833001 e1a04003 (e19440d5)

This disassembles to:

0: e79e5103 ldr r5, [lr, r3, lsl #2]
4: e28c3001 add r3, ip, #1
8: e0833001 add r3, r3, r1
c: e1a04003 mov r4, r3
10: e19440d5 ldrsb r4, [r4, r5]

I don't have a similarly configured kernel, but here I have for the
start of this function:

00000680 <__mod_node_page_state>:
680: e1a0c00d mov ip, sp
684: e92dd870 push {r4, r5, r6, fp, ip, lr, pc}
688: e24cb004 sub fp, ip, #4
68c: e590cc00 ldr ip, [r0, #3072] ; 0xc00
690: e1a0400d mov r4, sp
694: ee1d6f90 mrc 15, 0, r6, cr13, cr0, {4}
698: e08c5001 add r5, ip, r1
69c: e2855001 add r5, r5, #1
6a0: e1a03005 mov r3, r5
6a4: e196c0dc ldrsb ip, [r6, ip]
6a8: e19630d3 ldrsb r3, [r6, r3]

r5 in your code is the equivalent of r6, r4 => r3, r3 -> r5.
lr is the __per_cpu_offset array, so the first instruction is
trying to load the percpu offset.

The faulting code is:

x = delta + __this_cpu_read(*p);

specifically "__this_cpu_read(*p)".

"ip" holds "pcp" from:

struct per_cpu_nodestat __percpu *pcp = pgdat->per_cpu_nodestats;

and you may notice that it's zero in the register dump. So,
pgdat->per_cpu_nodestats is NULL here.

This seems to be setup in setup_per_cpu_pageset(), which in the init
order, happens way after mm_init() (which contains kmem_cache_init()).

So, looks to me like an init ordering bug. I'm not sure why SMP
would be working - maybe its only working because it's managing to
scribble over some memory that isn't faulting? I suspect a
WARN_ON(!pcp) here will report even on SMP.

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.