Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support

From: Sourabh Jain
Date: Wed Oct 12 2022 - 00:57:06 EST



On 08/10/22 01:30, Eric DeVolder wrote:


On 10/4/22 04:10, Sourabh Jain wrote:

On 30/09/22 21:06, Eric DeVolder wrote:


On 9/28/22 11:07, Borislav Petkov wrote:
On Tue, Sep 13, 2022 at 02:12:31PM -0500, Eric DeVolder wrote:
This topic was discussed previously https://lkml.org/lkml/2022/3/3/372.

Please do not use lkml.org to refer to lkml messages. We have a
perfectly fine archival system at lore.kernel.org. You simply do

https://lore.kernel.org/r/<Message-ID>

when you want to point to a previous mail.

ok, thanks for pointing that out to me.

David points out that terminology is tricky here due to differing behaviors.
And perhaps that is your point in asking for guidance text. It can be
complicated

Which means you need an explanation how to use this even more.

And why is CONFIG_CRASH_MAX_MEMORY_RANGES even a Kconfig item and not
something you discover from the hardware?

No, is the short answer.


Your help text talks about System RAM entries in /proc/iomem which means
that those entries are present somewhere in the kernel and you can read
them out and do the proper calculations dynamically instead of doing the
static CONFIG_NR_CPUS_DEFAULT + CONFIG_CRASH_MAX_MEMORY_RANGES thing.

The intent is to compute the max size buffer needed to contain a maximum populated elfcorehdr, which is primarily based on the number of CPUs and memory regions. Thus far I (and others involved) have not found a kernel method to determine the maximum number of memory regions possible (if you are aware of one, please let me know!). Thus CONFIG_CRASH_MAX_MEMORY_RANGES was born (rather borrowed from kexec-tools).

So no dynamic computation is possible, yet.


, but it all comes down to System RAM entries.

I could perhaps offer an overly simplified example such that for 1GiB block
size, for example, the CRASH_MAX_MEMORY_RANGES of 32768 would allow for 32TiB
of memory?

Yes, and stick it somewhere in Documentation/admin-guide/kdump/ and
refer to it in that help text so that people can find it and read how to
use your new option.

ok

The kbuf.bufsz value is obtained via a call to prepare_elf_headers(); I can
not initialize it at its declaration.

Sorry, I meant this:

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 8fc7d678ac72..ee6fd9f1b2b9 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -395,8 +395,9 @@ int crash_load_segments(struct kimage *image)
      if (ret)
          return ret;
  -    image->elf_headers = kbuf.buffer;
-    image->elf_headers_sz = kbuf.bufsz;
+    image->elf_headers    = kbuf.buffer;
+    image->elf_headers_sz    = kbuf.bufsz;
+    kbuf.memsz        = kbuf.bufsz;
    #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
      /* Ensure elfcorehdr segment large enough for hotplug changes */
@@ -407,9 +408,8 @@ int crash_load_segments(struct kimage *image)
      image->elf_headers_sz = kbuf.memsz;
      image->elfcorehdr_index = image->nr_segments;
      image->elfcorehdr_index_valid = true;
-#else
-    kbuf.memsz = kbuf.bufsz;
  #endif
+
      kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
      kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
      ret = kexec_add_buffer(&kbuf);

ok

I'm at a loss as to what to do differently here. You've raised this issue
before and I went back and looked at the suggestions then and I don't see
how that applies to this situation. How is this situation different than the
#ifdef CONFIG_KEXEC_FILE that immediately preceeds it?

See the diff at the end. I'm not saying this is how you should do it
but it should give you a better idea. The logic being, the functions
in the .c file don't really need ifdeffery around them - you're adding
1-2 functions and crash.c is not that big - so they can be built in
unconditionally. You'd need the ifdeffery *in the header only* when
crash.c is not being built.
ok; I've overlooked that scenario.

But I've done it with ifdeffery in the .c file now because yes, the
kexec code is a minefield of ifdeffery. Hell, there's ifdeffery even in
the headers for structs. Ifdeffery you don't really need. Someone should
clean that up and simplify this immensely.

ok


Currently there is a concurrent effort for PPC support by Sourabh
Jain, and in that effort arch_map_crash_pages() is using __va(paddr).

Why?

I do not know the nuances between kmap_local_page() and __va() to
answer the question.

kmap_local_page() is a generic interface and it should work on any arch.

And it is documented even:

$ git grep kmap_local_page Documentation/

If kmap_local_page() works for all archs, then I'm happy to drop these
arch_ variants and use it directly.

Yes, pls do.

I'll check with Sourabh to see if PPC can work with kmap_local_page().
I think kmap_local_page do support on  PowerPC. But can you explain why we need this
function here, aren't the reserve memory already available to use?

On x86, attempts to access the elfcorehdr without mapping it did not work (resulted
in a fault).

Let me know if using kmap_local_page() in place of __va() in arch_map_crash_pages().
If it does, then I can eliminate arch_un/map_crash_pages() and use kmap_local_page()
directly.
Hello Eric,

Atleast on ppc64 we have direct mapping available and hence just by doing page shift
on physical address (__va) we can get valid virtual address on powerpc. In short we don't
have to generate mapping again to access reserved region.

Regardless let's go with kdump_local_page API, it is supported on powerpc.

Thanks,
Sourabh Jain