Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support

From: Eric DeVolder
Date: Wed Oct 12 2022 - 16:44:13 EST

On 10/12/22 15:19, Eric DeVolder wrote:

On 10/12/22 12:46, Borislav Petkov wrote:
On Sat, Oct 08, 2022 at 10:35:14AM +0800, Baoquan He wrote:
Memory hptlug is not limited by a certin or a max number of memory
regions, but limited by how large of the linear mapping range which
physical can be mapped into.

Memory hotplug is not limited by some abstract range but by the *actual*
possibility of how many DIMM slots on any motherboard can hotplug
memory. Certainly not 32K.

So you can choose a sane default which covers *all* actual systems out

We run here QEMU with the ability for 1024 DIMM slots. A DIMM can be any
reasonable power of 2 size, and then that DIMM is further divided into memblocks,
typically 128MiB.

So, for example, 1TiB requires 1024 DIMMs of 1GiB each with 128MiB memblocks, that results
in 8K possible memory regions. So just going to 4TiB reaches 32K memory regions.

This I can attest for virtualized DIMMs, not sure about other memory hotplug technologies
like virtio-mem or DynamicMemory. But it seems reasonable that those technologies could
also easily reach into these number ranges.


Oh, to be fair, if the above were fully populated, it would essentially coalescence
into a single reported region via crash_prepare_elf64_headers(). But in the sadistic
case, where every other memblock was offlined, that would result in the need to
report half of the memory regions via the elfcorehdr.


For the Kconfig CRASH_MAX_MEMORY_RANGES Eric added, it's meaningful to
me to set a fixed value which is enough in reality.

Yes, exactly.

For extreme testing with special purpose, it could be broken easily,
people need decide by self whether the CONFIG_CRASH_MAX_MEMORY_RANGES
is enlarged or not.

I don't want for people to decide on one more thing where they have to
go and read a bunch of specs just to know what is a good value. So we
should set a sane, *practical* upper limit and simply go with it.

Everything else is testing stuff and if you test the kernel, then you
can change limits and values and so on as you want to.