Re: [PATCH] mm: nobootmem: Correct alloc_bootmem semantics.
From: Johannes Weiner
Date: Fri May 04 2012 - 05:41:10 EST
On Thu, May 03, 2012 at 01:04:16PM -0400, David Miller wrote:
> From: Johannes Weiner <hannes@xxxxxxxxxxx>
> Date: Thu, 3 May 2012 17:28:41 +0200
>
> > On Wed, Apr 25, 2012 at 07:00:34PM -0400, David Miller wrote:
> >> From: Yinghai Lu <yinghai@xxxxxxxxxx>
> >> Date: Wed, 25 Apr 2012 15:46:42 -0700
> >>
> >> > On Wed, Apr 25, 2012 at 1:10 PM, David Miller <davem@xxxxxxxxxxxxx> wrote:
> >> >> @@ -298,13 +298,19 @@ void * __init __alloc_bootmem_node(pg_data_t *pgdat, unsigned long size,
> >> >> if (WARN_ON_ONCE(slab_is_available()))
> >> >> return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
> >> >>
> >> >> +again:
> >> >> ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
> >> >> goal, -1ULL);
> >> >> if (ptr)
> >> >> return ptr;
> >> >
> >> > If you want to be consistent to bootmem version.
> >> >
> >> > again label should be here instead.
> >>
> >> It is merely an artifact of implementation that the bootmem version
> >> doesn't try to respect the given node if the goal cannot be satisfied,
> >> and in fact I would classify that as a bug that needs to be fixed.
> >>
> >> Therefore, I believe the bootmem case is what needs to be adjusted
> >> instead.
> >
> > Now it does: node+goal, goal, node, anywhere
> >
> > whereas the memblock version of __alloc_bootmem_node_nopanic() also
> > still does: node+goal, goal, anywhere
> >
> > Your description suggests that the node should be higher prioritized
> > than the goal, which I understand as: node+goal, node, anywhere.
> >
> > Which do we actually want?
>
> I think the goal is what needs to be prioritized. An explicit goal usually
> has a requirement, like "I need physical memory in the low 32-bits" and if
> they specified an explicit node they really mean "and give me it on NUMA
> node X if you can." Hence the sequence:
>
> node+goal, goal, node, any
>
> the only other reasonable option would be:
>
> node+goal, node, goal, any
>
> but I think that doesn't match what people want when an explicit goal
> is specified. Do you?
Oh I think that's what limit is for. The goal is usually to allocate
high address memory for users that can deal with it and keep lowmem
for users that can't.
For example, I can imagine sparsemem usemap allocation in the memory
hotplug case would prefer having the usemap on the same node as the
corresponding pgdat descriptor than allocating on any node above the
goal and possibly create circular dependencies.
But that is quite rare/unlikely anyway, and I guess in most other
cases it's better to go for preventing lowmem exhaustian than to
preserve node locality.
So I'm fine with this priority order, but it's a judgement call.
I'll send patches to make everything use the same policy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/