On Fri, 01 May 2015 20:09:21 -0400 Waiman Long<waiman.long@xxxxxx> wrote:
On 05/01/2015 06:02 PM, Waiman Long wrote:We're using more than 2G before we've even completed do_basic_setup()?
Bad news!...
I tried your patch on a 24-TB DragonHawk and got an out of memory
panic. The kernel log messages were:
[ 81.360287] [<ffffffff8151b0c9>] dump_stack+0x68/0x77I increased the pre-initialized memory per node in update_defer_init()
[ 81.365942] [<ffffffff8151ae1e>] panic+0xb9/0x219
[ 81.371213] [<ffffffff810785c3>] ?
__blocking_notifier_call_chain+0x63/0x80
[ 81.378971] [<ffffffff811384ce>] __out_of_memory+0x34e/0x350
[ 81.385292] [<ffffffff811385ee>] out_of_memory+0x5e/0x90
[ 81.391230] [<ffffffff8113ce9e>] __alloc_pages_slowpath+0x6be/0x740
[ 81.398219] [<ffffffff8113d15c>] __alloc_pages_nodemask+0x23c/0x250
[ 81.405212] [<ffffffff81186346>] kmem_getpages+0x56/0x110
[ 81.411246] [<ffffffff81187f44>] fallback_alloc+0x164/0x200
[ 81.417474] [<ffffffff81187cfd>] ____cache_alloc_node+0x8d/0x170
[ 81.424179] [<ffffffff811887bb>] kmem_cache_alloc_trace+0x17b/0x240
[ 81.431169] [<ffffffff813d5f3a>] init_memory_block+0x3a/0x110
[ 81.437586] [<ffffffff81b5f687>] memory_dev_init+0xd7/0x13d
[ 81.443810] [<ffffffff81b5f2af>] driver_init+0x2f/0x37
[ 81.449556] [<ffffffff81b1599b>] do_basic_setup+0x29/0xd5
[ 81.455597] [<ffffffff81b372c4>] ? sched_init_smp+0x140/0x147
[ 81.462015] [<ffffffff81b15c55>] kernel_init_freeable+0x20e/0x297
[ 81.468815] [<ffffffff81512ea0>] ? rest_init+0x80/0x80
[ 81.474565] [<ffffffff81512ea9>] kernel_init+0x9/0xf0
[ 81.480216] [<ffffffff8151f788>] ret_from_fork+0x58/0x90
[ 81.486156] [<ffffffff81512ea0>] ? rest_init+0x80/0x80
[ 81.492350] ---[ end Kernel panic - not syncing: Out of memory and
no killable processes...
[ 81.492350]
-Longman
of mm/page_alloc.c from 2G to 4G. Now I am able to boot the 24-TB
machine without error. The 12-TB has 0.75TB/node, while the 24-TB
machine has 1.5TB/node. I would suggest something like pre-initializing
1G per 0.25TB/node. In this way, it will scale properly with the memory
size.
Where did it all go?
Before the patch, the boot time from elilo prompt to ssh login was 694s.Having to guesstimate the amount of memory which is needed for a
After the patch, the boot up time was 346s, a saving of 348s (about 50%).
successful boot will be painful. Any number we choose will be wrong
99% of the time.
If the kswapd threads have started, all we need to do is to wait: take
a little nap in the allocator's page==NULL slowpath.
I'm not seeing any reason why we can't start kswapd much earlier -
right at the start of do_basic_setup()?