Re: [RFC 0/2] Delay initializing of large sections of memory

From: Nathan Zimmer
Date: Fri Jun 21 2013 - 15:19:07 EST

On 06/21/2013 02:10 PM, Yinghai Lu wrote:
On Fri, Jun 21, 2013 at 11:50 AM, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
On Fri, Jun 21, 2013 at 11:44:22AM -0700, Yinghai Lu wrote:
On Fri, Jun 21, 2013 at 10:03 AM, H. Peter Anvin <hpa@xxxxxxxxx> wrote:
On 06/21/2013 09:51 AM, Greg KH wrote:

I suspect the cutoff for this should be a lot lower than 8 TB even, more
like 128 GB or so. The only concern is to not set the cutoff so low
that we can end up running out of memory or with suboptimal NUMA
placement just because of this.
I would suggest another way:
only boot the system with boot node (include cpu, ram and pci root buses).
then after boot, could add other nodes.
What exactly do you mean by "after boot"? Often, the boot process of
userspace needs those additional cpus and ram in order to initialize
everything (like the pci devices) properly.
I mean for Intel cpu have cpu and memory controller and IIO.
every IIO is one peer pci root bus.
So scan root bus that are not with boot node later.

in this way we can keep all numa etc on the place when online ram, cpu, pci...

For example if we have 32 sockets system, most time for boot is with *BIOS*
instead of OS. In those kind of system boot is like this way:
only first two sockets get booted from bios to OS.
later use hot add every other two sockets.

that will also make BIOS simpler, and it need to support hot-add for
services purpose anyway.


Yes the hot add path was one option we looked at and it did shorten boot times but the goal I had here is to get from power on to having the full machine available as quick as possible. Several clients need significant portions of ram for their key workloads. So that guided my thoughts on this patch.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at