Re: [PATCH] MIPS: reserve the memblock right after the kernel
From: Serge Semin
Date: Wed Nov 11 2020 - 09:52:48 EST
Hello Alexander
On Tue, Nov 10, 2020 at 11:29:50AM +0100, Alexander Sverdlin wrote:
> Hello Thomas,
>
> On 10/11/2020 10:55, Thomas Bogendoerfer wrote:
> >>>> Linux doesn't own the memory immediately after the kernel image. On Octeon
> >>>> bootloader places a shared structure right close after the kernel _end,
> >>>> refer to "struct cvmx_bootinfo *octeon_bootinfo" in cavium-octeon/setup.c.
> >>>>
> >>>> If check_kernel_sections_mem() rounds the PFNs up, first memblock_alloc()
> >>>> inside early_init_dt_alloc_memory_arch() <= device_tree_init() returns
> >>>> memory block overlapping with the above octeon_bootinfo structure, which
> >>>> is being overwritten afterwards.
> >>> as this special for Octeon how about added the memblock_reserve
> >>> in octen specific code ?
> >> while the shared structure which is being corrupted is indeed Octeon-specific,
> >> the wrong assumption that the memory right after the kernel can be allocated by memblock
> >> allocator and re-used somewhere in Linux is in MIPS-generic check_kernel_sections_mem().
> > ok, I see your point. IMHO this whole check_kernel_sections_mem() should
> > be removed. IMHO memory adding should only be done my memory detection code.
> >
> > Could you send a patch, which removes check_kernel_section_mem completly ?
>
> this will expose one issue:
> platforms usually do it in a sane way, like it was done last 15 years, namely
> add kernel image without non-complete pages on the boundaries.
> This will lead to the situation, that request_resource() will fail at least
> for .bss section of the kernel and it will not be properly displayed under
> /proc/iomem (and probably same problem will appear, which initially motivated
> the creation of check_kernel_section_mem()).
Are you saying that some old platforms rely on the
check_kernel_section_mem() method adding the memory occupied by the
kernel to the system? If so, do you have an example of such?
Personally I also had my hand itching to remove that method years ago,
but I didn't dare to do so for the same reason in mind... On the other
hand if we detected all the platforms that needed that method, we could
have moved it to their prom_init() or something and got rid of that
atavism for good.
>
> As I understood, the issue is that memblock API operates internally on the
> page granularity (at least there are many ROUND_DOWN() inside for the size
> or upper boundary),
Hm, I don't think so. Memblock doesn't work with the pages granularity,
but with memory ranges. round_down()/round_up() are used to find a memory
range with proper alignment. (See __memblock_find_range_top_{up,down}()
method implementation.)
Memblock allocates a memory region with exact size and alignment as
requested. That's the beauty of that allocator and one of the reasons
why the kernel platforms have been painfully converted to using it instead
of the old bootmem allocator. BTW the later one has indeed operated
with page granularity.
Getting back to the memblock allocator. It works with pages only when
the kernel comes to starting the buddy allocator. So the kernel
invokes memblock_free_all(), which eventually gets to calling
free_low_memory_core_early()->__free_memory_core(). The later method indeed
sets the memory pages free, but as you can see it's done with correct
aligning PFN_UP(phys_start)/PFN_DOWN(end).
> so for request_resource() to success one has to claim
> the rest of the .bss last page. And with current memblock API
> memblock_reserve() must appear somewhere, being this ARCH or platform code.
After a short glance at the request_resource() code I didn't manage to
find a reason why the method would fail to request a page-unaligned
region. AFAICS it will fail only if the memory occupied by the kernel
hasn't been registered as system memory. The later case may happen
only for the systems which rely on the check_kernel_section_mem()
method being called in the generic arch_mem_init(). Of course we
shouldn't blindly have it removed, but instead move it to the
platforms, which have been unfortunate enough not to add the kernel
memory to the system memory pool.
So IMHO what could be the best conclusion in the framework of this patch:
1) As Thomas said any platform-specific reservation should be done in the
platform-specific code. That means if octeon needs some memory behind
the kernel being reserved, then it should be done for example in
prom_init().
2) The check_kernel_sections_mem() method can be removed. But it
should be done carefully. We at least need to try to find all the
platforms, which rely on its functionality.
-Sergey
>
> --
> Best regards,
> Alexander Sverdlin.