Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support

From: Eric DeVolder
Date: Fri Oct 28 2022 - 11:31:19 EST

Next message: Pierre Gondois: "[PATCH v2] sched/topology: Remove EM_MAX_COMPLEXITY limit"
Previous message: Guenter Roeck: "Re: [RFC][PATCH v2 00/31] timers: Use del_timer_shutdown() before freeing timers"
In reply to: Borislav Petkov: "Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support"
Next in thread: Borislav Petkov: "Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 10/28/22 05:19, Borislav Petkov wrote:

On Thu, Oct 27, 2022 at 02:24:11PM -0500, Eric DeVolder wrote:

Be aware, in reality, that if the system was fully populated, it would not
actually consume all 8192 phdrs. Rather /proc/iomem would essentially show a
large contiguous address space which would require just a single phdr.

Then that from below:

pnum += CONFIG_CRASH_MAX_MEMORY_RANGES;

which then would end up allocating 8192 would be a total waste.

So why don't you make that number dynamic then?

You start with something sensible:

total_num_pheaders = num_online_cpus() + "some number of regions" + "some few others"

I.e., a number which is a good compromise on the majority of machines.

Then, on hotplug events you count how many new regions are coming in
and when you reach the total_num_pheaders number, you double it (or some
other increase stragegy), reallocate the ELF header buffers etc needed
for kdump and you're good.

This way, you don't waste memory unnecessarily on the majority of
systems and those who need more, get to allocate more.

This patch series sizes and allocates the memory buffer/segment for the elfcorehdr once, at kdump load time.

In order to dynamically resize the elcorehdr memory buffer/segment, that causes the following ripple effects:

- Permitting resizing of the elfcorehdr requires a means to "allocate" a new size buffer from within the crash kernel reserved area. There is no allocator today; currently it is a kind of one-pass placement process that happens at load time. The critical side effect of allocating a new elfcorehdr buffer memory/segment is that it creates a new address for the elfcorehdr.

- The elfcorehdr is passed to the crash kernel via the elfcorehdr= kernel cmdline option. As such, a dynamic change to the size of the elfcorehdr size necessarily invites a change of address of that buffer, and therefore a change to rewrite the crash kernel cmdline to reflect the new elfcorehdr buffer address.

- A change to the cmdline, also invites a possible change of address of the buffer containing the cmdline, and thus a change to the x86 boot_params, which contains the cmdline pointer.

- A change to the cmdline and/or boot_params, which are *not* excluded from the hash/digest, means that the purgatory hash/digest needs to be recomputed, and purgatory re-linked with the new hash/digest and replaced.

A fair amount of work, but I have had this working in the past, around the v1 patch series timeframe. However, it only worked for the kexec_file_load() syscall as all the needed pieces of information were available; but for kexec_load(), it isn't possible to relink purgatory as by that point purgatory is but a user-space binary blob.

It was feedback on the v1/v2 that pointed out that by excluding the elfcorehdr from the hash/digest, the "change of address" problem with the elfcorehdr buffer/segment goes away, and, in turn, negates the need to: introduce an allocator for the crash kernel reserved space, rewrite the crash kernel cmdline with a new elfcorehdr, update boot_params with a new cmdline and re-link and replace purgatory with the updated digest. And it enables this hotplug efforts to support kexec_load() syscall as well.

So it is with this in mind that I suggest we stay with the statically sized elfcorehdr buffer.

If that can be agreed upon, then it is "just a matter" of picking a useful elfcorehdr size. Currently that size is derived from the NR_DEFAULT_CPUS and CRASH_MAX_MEMORY_RANGES. So, there is still the CRASH_MAX_MEMORY_RANGES knob to help a dial in size, should there be some issue with the default value/size.

Or if there is desire to drop computing the size from NR_DEFAULT_CPUs and CRASH_MAX_MEMORY_RANGES and simply go with CRASH_HOTPLUG_ELFCOREHDR_SZ which simply specifies the buffer size, then I'm also good with that.

I still owe a much better explanation of how to size the elfcorehdr. I can use the comments and ideas from the discussion to provide the necessary insight when choosing this value, whether that be CRASH_MAX_MEMORY_RANGES or CRASH_HOTPLUG_ELFCOREHDR_SZ.

I'd prefer keeping CRASH_MAX_MEMORY_RANGES as that allow the maximum phdr
number value to be reflective of CPUs and/or memory; not all systems support
both CPU and memory hotplug. For example, I have queued up this change to
reflect this:

if (IS_ENABLED(CONFIG_HOTPLUG_CPU) || IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) {

If you're going to keep CRASH_MAX_MEMORY_RANGES, then you can test only
that thing as it expresses the dependency on CONFIG_HOTPLUG_CPU and
CONFIG_MEMORY_HOTPLUG already.

If you end up making the number dynamic, then you could make that a
different Kconfig item which contains all that crash code as most of the
people won't need it anyway.

It is my intention to correct the CRASH_MAX_MEMORY_RANGES (if we keep it) as such:

config CRASH_MAX_MEMORY_RANGES
depends on CRASH_DUMP && KEXEC_FILE && MEMORY_HOTPLUG

CRASH_MAX_MEMORY_RANGES should have never had CPU_HOTPLUG as a dependency; that was a cut-n-paste error on my part.

Hmm?

Thank you for the time and thought on this topic!
eric

Next message: Pierre Gondois: "[PATCH v2] sched/topology: Remove EM_MAX_COMPLEXITY limit"
Previous message: Guenter Roeck: "Re: [RFC][PATCH v2 00/31] timers: Use del_timer_shutdown() before freeing timers"
In reply to: Borislav Petkov: "Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support"
Next in thread: Borislav Petkov: "Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]