Re: [PATCH v11 7/7] x86/crash: Add x86 crash hotplug support

From: Eric DeVolder
Date: Tue Sep 06 2022 - 12:12:52 EST




On 9/4/22 22:19, Baoquan He wrote:
On 08/26/22 at 01:37pm, Eric DeVolder wrote:
For x86_64, when CPU or memory is hot un/plugged, the crash
elfcorehdr, which describes the CPUs and memory in the system,
must also be updated.

When loading the crash kernel via kexec_load or kexec_file_load,
the elfcorehdr is identified at run time in
crash_core:handle_hotplug_event().

To update the elfcorehdr for x86_64, a new elfcorehdr must be
generated from the available CPUs and memory. The new elfcorehdr
is prepared into a buffer, and then installed over the top of
the existing elfcorehdr.

In the patch 'kexec: exclude elfcorehdr from the segment digest'
the need to update purgatory due to the change in elfcorehdr was
eliminated. As a result, no changes to purgatory or boot_params
(as the elfcorehdr= kernel command line parameter pointer
remains unchanged and correct) are needed, just elfcorehdr.

To accommodate a growing number of resources via hotplug, the
elfcorehdr segment must be sufficiently large enough to accommodate
changes, see the CRASH_MAX_MEMORY_RANGES configure item.

With this change, crash hotplug for kexec_file_load syscall
is supported. The kexec_load is also supported, but also
requires a corresponding change to userspace kexec-tools.

Signed-off-by: Eric DeVolder <eric.devolder@xxxxxxxxxx>
---
arch/x86/Kconfig | 11 ++++
arch/x86/include/asm/kexec.h | 20 +++++++
arch/x86/kernel/crash.c | 102 +++++++++++++++++++++++++++++++++++
3 files changed, 133 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f9920f1341c8..cdfc9b2fdf98 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2056,6 +2056,17 @@ config CRASH_DUMP
(CONFIG_RELOCATABLE=y).
For more details see Documentation/admin-guide/kdump/kdump.rst
+config CRASH_MAX_MEMORY_RANGES
+ depends on CRASH_DUMP && KEXEC_FILE && (HOTPLUG_CPU || MEMORY_HOTPLUG)
+ int
+ default 32768
+ help
+ For the kexec_file_load path, specify the maximum number of
+ memory regions, eg. as represented by the 'System RAM' entries
+ in /proc/iomem, that the elfcorehdr buffer/segment can accommodate.
+ This value is combined with NR_CPUS and multiplied by Elf64_Phdr
+ size to determine the final buffer size.
+
config KEXEC_JUMP
bool "kexec jump"
depends on KEXEC && HIBERNATION
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index a3760ca796aa..432073385b2d 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -212,6 +212,26 @@ typedef void crash_vmclear_fn(void);
extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
extern void kdump_nmi_shootdown_cpus(void);
+void *arch_map_crash_pages(unsigned long paddr, unsigned long size);
+#define arch_map_crash_pages arch_map_crash_pages
+
+void arch_unmap_crash_pages(void **ptr);
+#define arch_unmap_crash_pages arch_unmap_crash_pages
+
+void arch_crash_handle_hotplug_event(struct kimage *image,
+ unsigned int hp_action);
+#define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
+
+#ifdef CONFIG_HOTPLUG_CPU
+static inline int crash_hotplug_cpu_support(void) { return 1; }
+#define crash_hotplug_cpu_support crash_hotplug_cpu_support
+#endif
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+static inline int crash_hotplug_memory_support(void) { return 1; }
+#define crash_hotplug_memory_support crash_hotplug_memory_support
+#endif

Check these two functions again, wonder whether this is a common usage
to show support of a feature. I don't remember where we do like this. We
usually define a global variable to mark it? Anyway, it works. See if
other people have comments, I could be ignorant.

These functions are for use in generating the value of the associated the sysfs attribute.

The default-value function (which previously was a __weak function) is in include/linux/kexec.h and the arch-specific override in arch/<arch>/include/asm/kexec.h. This default-value function sets the return to 0, for not supported.

I looked around a bit, and don't necessarily see a precedent. Then again some of this conversion of away from __weak is still making its way. I did briefly experiment with using a variable but decided there would still need to be a similar function needed (mostly likely on the subsys_initcall) in order to properly set the variable. So with the move away from __weak, it seemed to make sense to just keep the individual functions as they are now, in the spirit of the move away from __weak.

If the variable is still desired, please let me know. I'll post v12 at the end of the week to incorporate the fixes for some of the nits previously identified!


Other than this tiny concern, this patch looks good to me:

Acked-by: Baoquan He <bhe@xxxxxxxxxx>

Thank you!
eric


+
#endif /* __ASSEMBLY__ */
#endif /* _ASM_X86_KEXEC_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 9ceb93c176a6..8fc7d678ac72 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -25,6 +25,7 @@
#include <linux/slab.h>
#include <linux/vmalloc.h>
#include <linux/memblock.h>
+#include <linux/highmem.h>
#include <asm/processor.h>
#include <asm/hardirq.h>
@@ -397,7 +398,18 @@ int crash_load_segments(struct kimage *image)
image->elf_headers = kbuf.buffer;
image->elf_headers_sz = kbuf.bufsz;
+#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
+ /* Ensure elfcorehdr segment large enough for hotplug changes */
+ kbuf.memsz =
+ (CONFIG_NR_CPUS_DEFAULT + CONFIG_CRASH_MAX_MEMORY_RANGES) *
+ sizeof(Elf64_Phdr);
+ /* Mark as usable to crash kernel, else crash kernel fails on boot */
+ image->elf_headers_sz = kbuf.memsz;
+ image->elfcorehdr_index = image->nr_segments;
+ image->elfcorehdr_index_valid = true;
+#else
kbuf.memsz = kbuf.bufsz;
+#endif
kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
ret = kexec_add_buffer(&kbuf);
@@ -412,3 +424,93 @@ int crash_load_segments(struct kimage *image)
return ret;
}
#endif /* CONFIG_KEXEC_FILE */
+
+#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
+/*
+ * NOTE: The addresses and sizes passed to this routine have
+ * already been fully aligned on page boundaries. There is no
+ * need for massaging the address or size.
+ */
+void *arch_map_crash_pages(unsigned long paddr, unsigned long size)
+{
+ void *ptr = NULL;
+
+ if (size > 0) {
+ struct page *page = pfn_to_page(paddr >> PAGE_SHIFT);
+
+ ptr = kmap_local_page(page);
+ }
+
+ return ptr;
+}
+
+void arch_unmap_crash_pages(void **ptr)
+{
+ if (ptr) {
+ if (*ptr)
+ kunmap_local(*ptr);
+ *ptr = NULL;
+ }
+}
+
+/**
+ * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
+ * @image: the active struct kimage
+ * @hp_action: the hot un/plug action being handled
+ *
+ * To accurately reflect hot un/plug changes, the new elfcorehdr
+ * is prepared in a kernel buffer, and then it is written on top
+ * of the existing/old elfcorehdr.
+ */
+void arch_crash_handle_hotplug_event(struct kimage *image,
+ unsigned int hp_action)
+{
+ struct kexec_segment *ksegment;
+ unsigned char *ptr = NULL;
+ unsigned long elfsz = 0;
+ void *elfbuf = NULL;
+ unsigned long mem, memsz;
+
+ /*
+ * Elfcorehdr_index_valid checked in crash_core:handle_hotplug_event()
+ */
+ ksegment = &image->segment[image->elfcorehdr_index];
+ mem = ksegment->mem;
+ memsz = ksegment->memsz;
+
+ /*
+ * Create the new elfcorehdr reflecting the changes to CPU and/or
+ * memory resources.
+ */
+ if (prepare_elf_headers(image, &elfbuf, &elfsz)) {
+ pr_err("crash hp: unable to prepare elfcore headers");
+ goto out;
+ }
+ if (elfsz > memsz) {
+ pr_err("crash hp: update elfcorehdr elfsz %lu > memsz %lu",
+ elfsz, memsz);
+ goto out;
+ }
+
+ /*
+ * At this point, we are all but assured of success.
+ * Copy new elfcorehdr into destination.
+ */
+ ptr = arch_map_crash_pages(mem, memsz);
+ if (ptr) {
+ /*
+ * Temporarily invalidate the crash image while the
+ * elfcorehdr is updated.
+ */
+ xchg(&kexec_crash_image, NULL);
+ memcpy_flushcache((void *)ptr, elfbuf, elfsz);
+ xchg(&kexec_crash_image, image);
+ }
+ arch_unmap_crash_pages((void **)&ptr);
+ pr_debug("crash hp: re-loaded elfcorehdr at 0x%lx\n", mem);
+
+out:
+ if (elfbuf)
+ vfree(elfbuf);
+}
+#endif
--
2.31.1