[PATCH] kexec: fix out of the ELF headers buffer issue in syscall kexec_file_load()

From: Lee, Chun-Yi
Date: Mon Sep 28 2015 - 02:41:53 EST


On big machines have CPU number that's very nearly to consume whole ELF
headers buffer that's page aligned, 4096, 8192... Then the page fault error
randomly happened.

This patch modified the code in fill_up_crash_elf_data() by using
walk_system_ram_res() instead of walk_system_ram_range() to count the max
number of crash memory ranges. That's because the walk_system_ram_range()
filters out small memory regions that reside the same page, but
walk_system_ram_res() does not.

The oringial page fault issue sometimes happened on big machines when
preparing ELF headers:

[ 305.291522] BUG: unable to handle kernel paging request at ffffc90613fc9000
[ 305.299621] IP: [<ffffffff8103d645>] prepare_elf64_ram_headers_callback+0x165/0x260
[ 305.308300] PGD e000032067 PUD 6dcbec54067 PMD 9dc9bdeb067 PTE 0
[ 305.315393] Oops: 0002 [#1] SMP
[...snip]
[ 305.420953] task: ffff8e1c01ced600 ti: ffff8e1c03ec2000 task.ti: ffff8e1c03ec2000
[ 305.429292] RIP: 0010:[<ffffffff8103d645>] [<ffffffff8103d645>] prepare_elf64_ra
m_headers_callback+0x165/0x260
[...snip]

After tracing prepare_elf64_headers() and prepare_elf64_ram_headers_callback(),
the code uses walk_system_ram_res() to fill-in crash memory regions information
to program header, so it counts those small memory regions that reside in a
page area. But, when kernel was using walk_system_ram_range() in
fill_up_crash_elf_data() to count the number of crash memory regions, it
filters out small regions.

I printed those small memory regions, for example:

kexec: Get nr_ram ranges. vaddr=0xffff880077592258 paddr=0x77592258, sz=0xdc0

Base on the logic of walk_system_ram_range(), this memory region will be
filter out:

pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
end_pfn - pfn = 0x77593 - 0x77593 = 0 <=== if (end_pfn > pfn) [FAIL]

So, the max_nr_ranges that counted by kernel doesn't include small memory
regions. That causes the page fault issue happened in later code path for
preparing EFL headers,

This issue was hided on small machine that doesn't have too many CPU because
the free space of ELF headers buffer can cover the number of small memory
regions. But, when the machine has more CPUs or the number of memory regions
very nearly to consume whole page aligned buffer, e.g. 4096, 8192... Then
issue will happen randomly.

Signed-off-by: Lee, Chun-Yi <jlee@xxxxxxxx>
---
arch/x86/kernel/crash.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index e068d66..ad273b3d 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -185,8 +185,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
}

#ifdef CONFIG_KEXEC_FILE
-static int get_nr_ram_ranges_callback(unsigned long start_pfn,
- unsigned long nr_pfn, void *arg)
+static int get_nr_ram_ranges_callback(u64 start, u64 end, void *arg)
{
int *nr_ranges = arg;

@@ -214,7 +213,7 @@ static void fill_up_crash_elf_data(struct crash_elf_data *ced,

ced->image = image;

- walk_system_ram_range(0, -1, &nr_ranges,
+ walk_system_ram_res(0, -1, &nr_ranges,
get_nr_ram_ranges_callback);

ced->max_nr_ranges = nr_ranges;
--
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/