Re: [PATCH] remove the initrd resource in /proc/iomem as the initrd has freed the reserved memblock.

From: James Morse
Date: Tue Jul 02 2019 - 11:18:23 EST


Hello,

On 02/07/2019 11:34, Yi Wang wrote:
> From: Junhua Huang <huang.junhua@xxxxxxxxxx>
>
> The 'commit 50d7ba36b916 ("arm64: export memblock_reserve()d regions via /proc/iomem")'
> show the reserved memblock in /proc/iomem. But the initrd's reserved memblock
> will be freed in free_initrd_mem(), which executes after the reserve_memblock_reserved_regions().
> So there are some incorrect information shown in /proc/iomem. e.g.:
> 80000000-bbdfffff : System RAM
> 80080000-813effff : Kernel code
> 813f0000-8156ffff : reserved
> 81570000-817fcfff : Kernel data
> 83400000-83ffffff : reserved
> 90000000-90004fff : reserved
> b0000000-b2618fff : reserved
> b8c00000-bbbfffff : reserved
> In this case, the range from b0000000 to b2618fff is reserved for initrd, which should be
> clean from the resource tree after it was freed.

(There was some discussion about this over-estimate on the list, but it didn't make it
into the commit message.) I think a reserved->free change is fine. If user-space thinks
its still reserved nothing bad happens.


> As kexec-tool will collect the iomem reserved info
> and use it in second kernel, which causes error message generated a second time.

What error message?

These second-level reserved regions are to avoid kexec overwriting ACPI tables, or bits of
the UEFI memory map with its data.

The second kernel should not expect to find the initramfs here...


> At the same time, we should free the reserved memblock in an aligned manner because
> the initrd reserves the memblock in an aligned manner in arm64_memblock_init().
> Otherwise there are some fragments in memblock_reserved regions. e.g.:
> /sys/kernel/debug/memblock # cat reserved
> 0: 0x0000000080080000..0x00000000817fafff
> 1: 0x0000000083400000..0x0000000083ffffff
> 2: 0x0000000090000000..0x000000009000407f
> 3: 0x00000000b0000000..0x00000000b000003f
> 4: 0x00000000b26184ea..0x00000000b2618fff
> The fragments like the ranges from b0000000 to b000003f and from b26184ea to b2618fff
> should be freed.

I agree we should fix this. Could it be done as a separate patch, as this is a different
issue?


> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index d2adffb81b5d..14ba8113eab5 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -580,8 +580,16 @@ void free_initmem(void)
> #ifdef CONFIG_BLK_DEV_INITRD
> void __init free_initrd_mem(unsigned long start, unsigned long end)
> {
> + struct resource *res = NULL;

> free_reserved_area((void *)start, (void *)end, 0, "initrd");

Could we do this last? This calls __free_pages(), once we've done that it could be
allocated somewhere else, but memblock and iomem still say this is reserved memory.


> - memblock_free(__virt_to_phys(start), end - start);
> + start = __virt_to_phys(start) & PAGE_MASK;
> + end = PAGE_ALIGN(__virt_to_phys(end));
> + memblock_free(start, end - start);

> + res = lookup_resource(&iomem_resource, memblock_start_of_DRAM());
> + if (res != NULL)

("if (res)" is the preferred style)

> + __release_region(res, start, end - start);

Hmm, I can't see where __release_region() splits start/end out of an existing resource...

I think this means if you have memblock_reserved() pages either size of the initramfs they
get merged by memblock and put into iomem_resource as one entry. We'd hit
__release_regions()s:
| if (res->start != start || res->end != end)
| break;
[...]
| printk(KERN_WARNING "Trying to free nonexistent resource "
| "<%016llx-%016llx>\n", (unsigned long long)start,
| (unsigned long long)end);

It looks like __release_region() is only for exactly the region you reserved. We need it
to split one resource into two, the reverse of what reserve_region_with_split() does.

I can't see a helper that would do this.


An alternative is to explicitly create an iomem_resource for the initramfs, like we do for
the kernel text/data. reserve_region_with_split() will then fill in the gaps around it if
there are any, and you can pass it to release_resource() in here instead of searching for it.

It should be possible to test by rounding the initramfs base/size in arm64_memblock_init()
by a bit more.


Thanks,

James