Re: [PATCH v3 1/2] x86/mm: Add an option to change the padding used for the physical memory mapping

From: Travis, Mike
Date: Wed Sep 19 2018 - 19:05:40 EST

On 9/19/2018 7:10 AM, Masayoshi Mizuma wrote:
> On Wed, Sep 19, 2018 at 02:48:06PM +0200, Ingo Molnar wrote:
>> * Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>>> On Wed, 19 Sep 2018, Ingo Molnar wrote:
>>>> * Masayoshi Mizuma <msys.mizuma@xxxxxxxxx> wrote:
>>>>> Ping...
>>>>> I would appreciate if someone could review it because this patch
>>>>> fixes the real memory hotplug issue...
>>>> Yeah, so I generally try to resist random new boot options that
>>>> work around real bugs, so please convince me that this patch
>>>> is the best option:

I whole hardily concur, that having boot options which are not easily
understood should be avoided. The very best is the system should just
work. But on very large systems, these boot options are typically
determined by either automated scripts, or careful instructions to the
trained onsite customer engineers, who are required to "get it right".

>>>>> On Tue, Sep 04, 2018 at 11:11:40AM -0400, Masayoshi Mizuma wrote:
>>>>>> From: Masayoshi Mizuma <m.mizuma@xxxxxxxxxxxxxx>
>>>>>> If each node of physical memory layout has huge space for hotplug,
>>>>>> the padding used for the physical memory mapping section is not enough.
>>>>>> For exapmle of the layout:
>>>>>> SRAT: Node 6 PXM 4 [mem 0x100000000000-0x13ffffffffff] hotplug
>>>>>> SRAT: Node 7 PXM 5 [mem 0x140000000000-0x17ffffffffff] hotplug
>>>>>> SRAT: Node 2 PXM 6 [mem 0x180000000000-0x1bffffffffff] hotplug
>>>>>> SRAT: Node 3 PXM 7 [mem 0x1c0000000000-0x1fffffffffff] hotplug
>>>>>> We can increase the padding by CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING,
>>>>>> however, the needed padding size depends on the system environment.
>>>>>> The kernel option is better than changing the config.
>>>>>> Change log from v2:
>>>>>> - Simplify the description. As Baoquan said, this is simillar SGI UV issue,
>>>>>> but a little different. Remove SGI UV description.
>>>> Could you please explain it a bit better where the higher padding requirement comes from?
>>>> 'system environment' is very opaque.
>>> As I understand it, it's depending on the actual physical characteristics
>>> of the machine. So setting a fixed value in Kconfig might work for one, but
>>> not for others and having a command line option allows to tweak that at
>>> boot time and having a common kernel image.
>>> Ideally we would calculate that from SRAT, but AFAICT SRAT is not available
>>> at the point where this needs to be done.
> Yes, that's right. The KASLR initialization is early boot sequence,
> so SRAT is not available at that time.

Some facts are available via the x86 boot options structure passed from
BIOS. Is there enough info in there to help determine what the optimal
value of this parameter should be? Even a safe guess gets the system
booted and can then be refined for the next reboot.

>> Yeah, so could we at least do something like this:
>> - See whether using the maximum padding as the new default padding would work for everyone?
>> A bit more virtual memory used, or are there other costs as well?
> The current default padding size if CONFIG_MEMORY_HOTPLUG set is 10TB.
> IMO, it should not be increased because it gets the available entropy
> decreased...
>> - Add checking code to the later SRAT case to at least _detect_ bad padding after the fact.
>> We don't utilize RAM with bad padding until that, right?
> I have an idea as following. Does that make sense?
> Add a warning message which shows the padding size is not enough
> for the physical memory mapping and tell to the user about
> recommended padding size. User can change the padding size in next
> reboot to add the boot parameter.

Again, leaving it solely up to the user is probably not the best
approach, either for single workstation users who may not understand
what's up, or large system users which will just generate a customer
service call, because something went wrong and they can't boot. Or
their performance went down the drain. [Normally upgrades that change
the system config use an onsite CE, but that's not strictly required.]

So basically a deterministic method of calculating what this padding
should be works best from a customer support angle. For an individual
workstation user, having the kernel determine what's correct for proper
operation is the best.


>> - Add 'quirk' to the name of the boot parameter, to make it clear that this is really due to
>> suboptimal communication between the firmware and the kernel.
> I'm ok if 'quirk' is added to the boot parameter.
> Thanks,
> Masa