Re: [PATCH] mm: make mem_map allocation continuous.

From: Yinghai Lu
Date: Wed Mar 12 2008 - 12:40:47 EST


On Wed, Mar 12, 2008 at 4:39 AM, Mel Gorman <mel@xxxxxxxxx> wrote:
> On (11/03/08 09:14), Ingo Molnar didst pronounce:
>
>
> >
> > * Yinghai Lu <yhlu.kernel@xxxxxxxxx> wrote:
> >
> > > [PATCH] mm: make mem_map allocation continuous.
> > >
> > > vmemmap allocation current got
> > > [ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
> > > [ffffe20000200000-ffffe200003fffff] PMD ->ffff810001800000 on node 0
> > > [ffffe20000400000-ffffe200005fffff] PMD ->ffff810001c00000 on node 0
> > > [ffffe20000600000-ffffe200007fffff] PMD ->ffff810002000000 on node 0
> > > [ffffe20000800000-ffffe200009fffff] PMD ->ffff810002400000 on node 0
> > > ...
> > >
> > > there is 2M hole between them.
> > >
> > > the rootcause is that usemap (24 bytes) will be allocated after every 2M
> > > mem_map. and it will push next vmemmap (2M) to next align (2M).
> > >
> > > solution:
> > > try to allocate mem_map continously.
> > >
> > > after patch, will get
> > > [ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
> > > [ffffe20000200000-ffffe200003fffff] PMD ->ffff810001600000 on node 0
> > > [ffffe20000400000-ffffe200005fffff] PMD ->ffff810001800000 on node 0
> > > [ffffe20000600000-ffffe200007fffff] PMD ->ffff810001a00000 on node 0
> > > [ffffe20000800000-ffffe200009fffff] PMD ->ffff810001c00000 on node 0
> > > ...
> > > and usemap will share in page because of they are allocated continuously too.
> > > sparse_early_usemap_alloc: usemap = ffff810024e00000 size = 24
> > > sparse_early_usemap_alloc: usemap = ffff810024e00080 size = 24
> > > sparse_early_usemap_alloc: usemap = ffff810024e00100 size = 24
> > > sparse_early_usemap_alloc: usemap = ffff810024e00180 size = 24
> > > ...
> > >
> > > so we make the bootmem allocation more compact and use less memory for usemap.
> > >
> > > Signed-off-by: Yinghai Lu <yhlu.kernel@xxxxxxxxx>
> >
> > very nice fix!
> >
>
> Agreed, good work.
>
>
> > i suspect this patch should go via -mm.
> >
> > > usemap = alloc_bootmem_node(NODE_DATA(nid), usemap_size());
> > > + printk(KERN_INFO "sparse_early_usemap_alloc: usemap = %p size = %ld\n", usemap, usemap_size());
> >
> > this should be in a separate patch.
> >
>
> Should this be KERN_DEBUG instead of KERN_INFO?
yes. to KERN_DEBUG or removed.
>
> I don't have the original mail because I got unsubscribed from the lists
> a few days ago and didn't notice (have been having mail issues) so
> pardon awkward cut & pastes
>
>
> > +/* section_map pointer array is 64k */
> > +static __initdata struct page *section_map[NR_MEM_SECTIONS];
>
> The size of this varies depending on architecture so the comment may be
> misleading. Maybe a comment like the following would be better?
>
> /*
> * The portions of the mem_map used by SPARSEMEM are allocated in
> * batch and temporarily stored in this array. When sparse_init()
> * completes, the array is discarded
> */
Yes. for x86_64 is (1<<13)*8 at most. others should much less.
>
> I can see why you file-scoped it because its too large for the stack but
> would it be better to allocate it from bootmem instead? It is
> available by the time you need to use the array.
Yes.
need to after another patch I sent out yesterday to make
free_bootmem_core could handle out of range inputs.

then could use alloc_bootmem and
for_each_online_node(node)
free_bootmem_node(node, section_map, size);

will send delta to Andrew.

YH

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/