Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end

From: Serge Semin
Date: Tue Mar 02 2021 - 11:22:01 EST


On Mon, Mar 01, 2021 at 08:09:52PM -0800, Florian Fainelli wrote:
>
>
> On 3/1/2021 1:22 AM, Serge Semin wrote:
> > On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote:
> >> Hi Serge,
> >>
> >> On 2/28/2021 3:08 PM, Serge Semin wrote:
> >>> Hi folks,
> >>> What you've got here seems a more complicated problem than it
> >>> could originally look like. Please, see my comments below.
> >>>
> >>> (Note I've discarded some of the email logs, which of no interest
> >>> to the discovered problem. Please also note that I haven't got any
> >>> Broadcom hardware to test out a solution suggested below.)
> >>>
> >>> On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
> >>>> Hi Mike,
> >>>>
> >>>> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
> >>>>> Hi Florian,
> >>>>>
> >>>>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
> >>>>>>
> >>>
> >>>>>> [...]
> >>>
> >>>>>>
> >>>>>> Hi Roman, Thomas and other linux-mips folks,
> >>>>>>
> >>>>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
> >>>>>> commit, reverting it makes our MIPS platforms boot successfully. We do
> >>>>>> not see a warning like this one in the commit message, instead what
> >>>>>> happens appear to be a corrupted Device Tree which prevents the parsing
> >>>>>> of the "rdb" node and leading to the interrupt controllers not being
> >>>>>> registered, and the system eventually not booting.
> >>>>>>
> >>>>>> The Device Tree is built-into the kernel image and resides at
> >>>>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
> >>>>>>
> >>>>>> Do you have any idea what could be wrong with MIPS specifically here?
> >>>
> >>> Most likely the problem you've discovered has been there for quite
> >>> some time. The patch you are referring to just caused it to be
> >>> triggered by extending the early allocation range. See before that
> >>> patch was accepted the early memory allocations had been performed
> >>> in the range:
> >>> [kernel_end, RAM_END].
> >>> The patch changed that, so the early allocations are done within
> >>> [RAM_START + PAGE_SIZE, RAM_END].
> >>>
> >>> In normal situations it's safe to do that as long as all the critical
> >>> memory regions (including the memory residing a space below the
> >>> kernel) have been reserved. But as soon as a memory with some critical
> >>> structures haven't been reserved, the kernel may allocate it to be used
> >>> for instance for early initializations with obviously unpredictable but
> >>> most of the times unpleasant consequences.
> >>>
> >>>>>
> >>>>> Apparently there is a memblock allocation in one of the functions called
> >>>>> from arch_mem_init() between plat_mem_setup() and
> >>>>> early_init_fdt_reserve_self().
> >>>
> >>> Mike, alas according to the log provided by Florian that's not the reason
> >>> of the problem. Please, see my considerations below.
> >>>
> >>>> [...]
> >>>>
> >>>> [ 0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
> >>>> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
> >>>> Feb 28 10:01:50 PST 2021
> >>>> [ 0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
> >>>> [ 0.000000] FPU revision is: 00130001
> >>>
> >>>> [ 0.000000] memblock_add: [0x00000000-0x0fffffff]
> >>>> early_init_dt_scan_memory+0x160/0x1e0
> >>>> [ 0.000000] memblock_add: [0x20000000-0x4fffffff]
> >>>> early_init_dt_scan_memory+0x160/0x1e0
> >>>> [ 0.000000] memblock_add: [0x90000000-0xcfffffff]
> >>>> early_init_dt_scan_memory+0x160/0x1e0
> >>>
> >>> Here the memory has been added to the memblock allocator.
> >>>
> >>>> [ 0.000000] MIPS: machine is Broadcom BCM97435SVMB
> >>>> [ 0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
> >>>> [ 0.000000] printk: bootconsole [ns16550a0] enabled
> >>>
> >>>> [ 0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
> >>>> setup_arch+0x128/0x69c
> >>>
> >>> Here the fdt memory has been reserved. (Note it's built into the
> >>> kernel.)
> >>>
> >>>> [ 0.000000] memblock_reserve: [0x00010000-0x018313cf]
> >>>> setup_arch+0x1f8/0x69c
> >>>
> >>> Here the kernel itself together with built-in dtb have been reserved.
> >>> So far so good.
> >>>
> >>>> [ 0.000000] Initrd not found or empty - disabling initrd
> >>>
> >>>> [ 0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000
> >>>> early_init_dt_alloc_memory_arch+0x40/0x84
> >>>> [ 0.000000] memblock_reserve: [0x00001000-0x00003aa0]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000
> >>>> early_init_dt_alloc_memory_arch+0x40/0x84
> >>>> [ 0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>
> >>> The log above most likely belongs to the call-chain:
> >>> setup_arch()
> >>> +-> arch_mem_init()
> >>> +-> device_tree_init() - BMIPS specific method
> >>> +-> unflatten_and_copy_device_tree()
> >>>
> >>> So to speak here we've copied the fdt from the original space
> >>> [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
> >>> it to [0x00003aa4-0x0000ba4b].
> >>>
> >>> The problem is that a bit later the next call-chain is performed:
> >>> setup_arch()
> >>> +-> plat_smp_setup()
> >>> +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
> >>> +-> if (!board_ebase_setup)
> >>> board_ebase_setup = &bmips_ebase_setup;
> >>>
> >>> So at the moment of the CPU traps initialization the bmips_ebase_setup()
> >>> method is called. What trap_init() does isn't compatible with the
> >>> allocation performed by the unflatten_and_copy_device_tree() method.
> >>> See the next comment.
> >>>
> >>>> [ 0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000
> >>>> early_init_dt_alloc_memory_arch+0x40/0x84
> >>>> [ 0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] memblock_reserve: [0x0096a000-0x00969fff]
> >>>> setup_arch+0x3fc/0x69c
> >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> >>>> [ 0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> >>>> [ 0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> >>>> [ 0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64
> >>>> bytes.
> >>>> [ 0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases,
> >>>> linesize 32 bytes
> >>>> [ 0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes.
> >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> >>>> [ 0.000000] memblock_reserve: [0x0000c000-0x0000cfff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> >>>> [ 0.000000] memblock_reserve: [0x0000d000-0x0000dfff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> >>>> [ 0.000000] memblock_reserve: [0x0000e000-0x0000efff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] Zone ranges:
> >>>> [ 0.000000] Normal [mem 0x0000000000000000-0x000000000fffffff]
> >>>> [ 0.000000] HighMem [mem 0x0000000010000000-0x00000000cfffffff]
> >>>> [ 0.000000] Movable zone start for each node
> >>>> [ 0.000000] Early memory node ranges
> >>>> [ 0.000000] node 0: [mem 0x0000000000000000-0x000000000fffffff]
> >>>> [ 0.000000] node 0: [mem 0x0000000020000000-0x000000004fffffff]
> >>>> [ 0.000000] node 0: [mem 0x0000000090000000-0x00000000cfffffff]
> >>>> [ 0.000000] Initmem setup node 0 [mem
> >>>> 0x0000000000000000-0x00000000cfffffff]
> >>>> [ 0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0
> >>>> from=0x00000000 max_addr=0x00000000
> >>>> alloc_node_mem_map.constprop.135+0x6c/0xc8
> >>>> [ 0.000000] memblock_reserve: [0x01831400-0x032313ff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0
> >>>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
> >>>> [ 0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0
> >>>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
> >>>> [ 0.000000] memblock_reserve: [0x0000bc80-0x0000bdff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] MEMBLOCK configuration:
> >>>> [ 0.000000] memory size = 0x80000000 reserved size = 0x0322f032
> >>>> [ 0.000000] memory.cnt = 0x3
> >>>> [ 0.000000] memory[0x0] [0x00000000-0x0fffffff], 0x10000000
> >>>> bytes flags: 0x0
> >>>> [ 0.000000] memory[0x1] [0x20000000-0x4fffffff], 0x30000000
> >>>> bytes flags: 0x0
> >>>> [ 0.000000] memory[0x2] [0x90000000-0xcfffffff], 0x40000000
> >>>> bytes flags: 0x0
> >>>> [ 0.000000] reserved.cnt = 0xa
> >>>> [ 0.000000] reserved[0x0] [0x00001000-0x00003aa0], 0x00002aa1
> >>>> bytes flags: 0x0
> >>>> [ 0.000000] reserved[0x1] [0x00003aa4-0x0000ba64], 0x00007fc1
> >>>> bytes flags: 0x0
> >>>> [ 0.000000] reserved[0x2] [0x0000ba80-0x0000ba9f], 0x00000020
> >>>> bytes flags: 0x0
> >>>> [ 0.000000] reserved[0x3] [0x0000bb00-0x0000bb1f], 0x00000020
> >>>> bytes flags: 0x0
> >>>> [ 0.000000] reserved[0x4] [0x0000bb80-0x0000bb9f], 0x00000020
> >>>> bytes flags: 0x0
> >>>> [ 0.000000] reserved[0x5] [0x0000bc00-0x0000bc1f], 0x00000020
> >>>> bytes flags: 0x0
> >>>> [ 0.000000] reserved[0x6] [0x0000bc80-0x0000bdff], 0x00000180
> >>>> bytes flags: 0x0
> >>>> [ 0.000000] reserved[0x7] [0x0000c000-0x0000efff], 0x00003000
> >>>> bytes flags: 0x0
> >>>> [ 0.000000] reserved[0x8] [0x00010000-0x018313cf], 0x018213d0
> >>>> bytes flags: 0x0
> >>>> [ 0.000000] reserved[0x9] [0x01831400-0x032313ff], 0x01a00000
> >>>> bytes flags: 0x0
> >>>> [ 0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654
> >>>> [ 0.000000] memblock_reserve: [0x0000be00-0x0000be1d]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654
> >>>> [ 0.000000] memblock_reserve: [0x0000be80-0x0000be9d]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884
> >>>> [ 0.000000] memblock_reserve: [0x0000f000-0x0000ffff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884
> >>>> [ 0.000000] memblock_reserve: [0x03231400-0x032323ff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1
> >>>> from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30
> >>>> [ 0.000000] memblock_reserve: [0x03233000-0x0327afff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] memblock_free: [0x03245000-0x03244fff]
> >>>> pcpu_embed_first_chunk+0x7a0/0x884
> >>>> [ 0.000000] memblock_free: [0x03257000-0x03256fff]
> >>>> pcpu_embed_first_chunk+0x7a0/0x884
> >>>> [ 0.000000] memblock_free: [0x03269000-0x03268fff]
> >>>> pcpu_embed_first_chunk+0x7a0/0x884
> >>>> [ 0.000000] memblock_free: [0x0327b000-0x0327afff]
> >>>> pcpu_embed_first_chunk+0x7a0/0x884
> >>>> [ 0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728
> >>>> [ 0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec
> >>>> [ 0.000000] memblock_reserve: [0x0000bf00-0x0000bf03]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec
> >>>> [ 0.000000] memblock_reserve: [0x0000bf80-0x0000bf83]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec
> >>>> [ 0.000000] memblock_reserve: [0x03232400-0x0323240f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec
> >>>> [ 0.000000] memblock_reserve: [0x03232480-0x0323248f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec
> >>>> [ 0.000000] memblock_reserve: [0x03232500-0x0323257f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294
> >>>> [ 0.000000] memblock_reserve: [0x03232580-0x032325db]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294
> >>>> [ 0.000000] memblock_reserve: [0x03232600-0x032328ff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294
> >>>> [ 0.000000] memblock_reserve: [0x03232900-0x03232c03]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294
> >>>> [ 0.000000] memblock_reserve: [0x03232c80-0x03232d3f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] memblock_free: [0x0000f000-0x0000ffff]
> >>>> pcpu_embed_first_chunk+0x838/0x884
> >>>> [ 0.000000] memblock_free: [0x03231400-0x032323ff]
> >>>> pcpu_embed_first_chunk+0x850/0x884
> >>>> [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 523776
> >>>> [ 0.000000] Kernel command line: console=ttyS0,115200 earlycon
> >>>> [ 0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
> >>>> [ 0.000000] memblock_reserve: [0x0327b000-0x0329afff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072
> >>>> bytes, linear)
> >>>> [ 0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
> >>>> [ 0.000000] memblock_reserve: [0x0329b000-0x032aafff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
> >>>> bytes, linear)
> >>>
> >>>> [ 0.000000] memblock_reserve: [0x00000000-0x000003ff]
> >>>> trap_init+0x70/0x4e8
> >>>
> >>> Most likely someplace here the corruption has happened. The log above
> >>> has just reserved a memory for NMI/reset vectors:
> >>> arch/mips/kernel/traps.c: trap_init(void): Line 2373.
> >>>
> >>> But then the board_ebase_setup() pointer is dereferenced and called,
> >>> which has been initialized with bmips_ebase_setup() earlier and which
> >>> overwrites the ebase variable with: 0x80001000 as this is
> >>> CPU_BMIPS5000 CPU. So any further calls of the functions like
> >>> set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
> >>> corruption of the memory above 0x80001000, which as we have discovered
> >>> belongs to fdt and unflattened device tree.
> >>>
> >>>> [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> >>>> [ 0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
> >>>> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
> >>>> cma-reserved, 1835008K highmem)
> >>>> [ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> >>>> [ 0.000000] rcu: Hierarchical RCU implementation.
> >>>> [ 0.000000] rcu: RCU event tracing is enabled.
> >>>> [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay
> >>>> is 25 jiffies.
> >>>> [ 0.000000] NR_IRQS: 256
> >>>
> >>>> [ 0.000000] OF: Bad cell count for /rdb
> >>>> [ 0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
> >>>> [ 0.000000] OF: of_irq_init: children remain, but no parents
> >>>
> >>> So here is the first time we have got the consequence of the corruption
> >>> popped up. Luckily it's just the "Bad cells count" error. We could have
> >>> got much less obvious log here up to getting a crash at some place
> >>> further...
> >>>
> >>>> [ 0.000000] random: get_random_bytes called from
> >>>> start_kernel+0x444/0x654 with crng_init=0
> >>>> [ 0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
> >>>> wraps every 8589934590000000ns
> >>>
> >>>>
> >>>> and with your patch applied which unfortunately did not work we have the
> >>>> following:
> >>>>
> >>>> [...]
> >>>
> >>> So a patch like this shall workaround the corruption:
> >>>
> >>> --- a/arch/mips/bmips/setup.c
> >>> +++ b/arch/mips/bmips/setup.c
> >>> @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
> >>>
> >>> __dt_setup_arch(dtb);
> >>>
> >>> + memblock_reserve(0x0, 0x1000 + 0x100*64);
> >>> +
> >>> for (q = bmips_quirk_list; q->quirk_fn; q++) {
> >>> if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
> >>> q->compatible)) {
> >>
> >
> >> This patch works, thanks a lot for the troubleshooting and analysis! How
> >> about the following which would be more generic and works as well and
> >> should be more universal since it does not require each architecture to
> >> provide an appropriate call to memblock_reserve():
> >
> > Hm, are you sure it's working?
>
> I was until I noticed that I was working on top of a revert of Roman's
> patch sorry about the brain fart here.
>
> > If so, my analysis hasn't been quite
> > correct. My suggestion was based on the memory initializations,
> > allocations and reservations trace. So here is the sequence of most
> > crucial of them:
> > 1) Memblock initialization:
> > start_kernel()->setup_arch()->arch_mem_init()->plat_mem_setup()->__dt_setup_arch()
> > (At this point I suggested to place the exceptions memory
> > reservation.)
> > 2) Base FDT memory reservation:
> > start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_reserve_self()
> > 3) FDT "reserved-memory" nodes parsing and corresponding memory ranges
> > reservation:
> > start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_scan_reserved_mem()
> > 4) Reserve kernel itself, some critical sections like initrd and
> > crash-kernel:
> > start_kernel()->setup_arch()->arch_mem_init()->bootmem_init()...
> > 5) Copy and unflatten the built-into the kernel device tree
> > (BMIPS-platform code):
> > start_kernel()->setup_arch()->arch_mem_init()->device_tree_init()
> > This is the very first time an allocation from the memblock pool
> > is performed. Since we haven't reserved a memory for the exception
> > vectors yet, the memblock allocator is free to return that memory
> > range for any other use. Needless to say if we try to use that memory
> > later without consulting with memblock, we may and in our case
> > will get into troubles.
> > 6) Many random early memblock allocations for kernel use before
> > buddy and sl*b allocators are up and running...
> > Note if for some fortunate reason the allocations made in 5) didn't
> > overlap the exceptions memory, here we have much more chances to
> > do that with obviously fatal consequences of the ranges independent
> > usage.
> > 7) Trap/exception vectors initialization and !memory reservation! for
> > them:
> > start_kernel()->trap_init()
> > Only at this point we get to reserve the memory for the vectors.
> > 8) Init and run buddy/sl*b allocators:
> > start_kernel()->mm_init()->...mem_init()...
> >
> > There are a lot of allocations done in 5) and 6) before the
> > trap_init() is called in 7). You can see that in your log. That's why
> > I have doubts that your patch worked well. Most likely you've
> > forgotten to revert the workaround suggested by me in the previous
> > message. Could you make sure that you didn't and re-test your patch
> > again? If it still works then I might have confused something and it's
> > strange that my patch worked in the first place...
>

> I would like to submit a fix for 5.12-rc1 and get it back ported into
> 5.11 so we have BMIPS machines boot again, that will be essentially your
> earlier proposed fix.
>
> BMIPS is the only "legacy" MIPS platform that defines an exception base,
> so while this problem may certainly exist with other platforms, I do
> wonder how likely it is there, though?

Hm, at least we can be sure that the problem exists for each platform,
which conforms to the !cpu_has_mips_r2_r6 condition and which have VEIC/
VINT capability. Those platforms may get out of the first PAGE_SIZE
memory in initializing the exceptions table thus corrupting the memory
possibly allocated for something else. In my case the problem doesn't
manifest itself because the CPU is MIPS32r5.

-Sergey

>
> >
> > A food for thoughts for everyone (Thomas, Mark, please join the
> > discussion). What we've got here is a bit bigger problem. AFAICS
> > if bottom-up allocation is enabled (it's our case) memblock_find_in_range_node()
> > performs the allocation above the very first PAGE_SIZE memory chunk
> > (see that method code for details). So we are currently on a safe side
> > for some older MIPS platforms. But the platform with VEIC/VINT may get
> > into the same troubles here if they didn't reserve exception memory
> > early enough before the kernel starts random allocations from
> > memblock. So we either need to provide a generic workaround for that
> > or make sure each platform gets to reserve vectors itself for instance
> > in the plat_mem_setup() method.
> >
> > -Sergey
> >
> >>
> >> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
> >> index e0352958e2f7..b0a173b500e8 100644
> >> --- a/arch/mips/kernel/traps.c
> >> +++ b/arch/mips/kernel/traps.c
> >> @@ -2367,10 +2367,7 @@ void __init trap_init(void)
> >>
> >> if (!cpu_has_mips_r2_r6) {
> >> ebase = CAC_BASE;
> >> - ebase_pa = virt_to_phys((void *)ebase);
> >> vec_size = 0x400;
> >> -
> >> - memblock_reserve(ebase_pa, vec_size);
> >> } else {
> >> if (cpu_has_veic || cpu_has_vint)
> >> vec_size = 0x200 + VECTORSPACING*64;
> >> @@ -2410,6 +2407,14 @@ void __init trap_init(void)
> >>
> >> if (board_ebase_setup)
> >> board_ebase_setup();
> >> +
> >> + /* board_ebase_setup() can change the exception base address
> >> + * reserve it now after changes were made.
> >> + */
> >> + if (!cpu_has_mips_r2_r6) {
> >> + ebase_pa = virt_to_phys((void *)ebase);
> >> + memblock_reserve(ebase_pa, vec_size);
> >> + }
> >> per_cpu_trap_init(true);
> >> memblock_set_bottom_up(false);
> >> --
> >> Florian
>
> --
> Florian