Re: [PATCH 1/5] x86, gfp: Cache best near node for memory allocation.

From: Jiang Liu
Date: Tue Aug 04 2015 - 04:06:12 EST


On 2015/8/4 11:36, Tang Chen wrote:
> Hi TJ,
>
> Sorry for the late reply.
>
> On 07/16/2015 05:48 AM, Tejun Heo wrote:
>> ......
>> so in initialization pharse makes no sense any more. The best near online
>> node for each cpu should be cached somewhere.
>> I'm not really following. Is this because the now offline node can
>> later come online and we'd have to break the constant mapping
>> invariant if we update the mapping later? If so, it'd be nice to
>> spell that out.
>
> Yes. Will document this in the next version.
>
>>> ......
>>> +int get_near_online_node(int node)
>>> +{
>>> + return per_cpu(x86_cpu_to_near_online_node,
>>> + cpumask_first(&node_to_cpuid_mask_map[node]));
>>> +}
>>> +EXPORT_SYMBOL(get_near_online_node);
>> Umm... this function is sitting on a fairly hot path and scanning a
>> cpumask each time. Why not just build a numa node -> numa node array?
>
> Indeed. Will avoid to scan a cpumask.
>
>> ......
>>
>>> static inline struct page *alloc_pages_exact_node(int nid, gfp_t
>>> gfp_mask,
>>> unsigned int order)
>>> {
>>> - VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
>>> + VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
>>> +
>>> +#if IS_ENABLED(CONFIG_X86) && IS_ENABLED(CONFIG_NUMA)
>>> + if (!node_online(nid))
>>> + nid = get_near_online_node(nid);
>>> +#endif
>>> return __alloc_pages(gfp_mask, order, node_zonelist(nid,
>>> gfp_mask));
>>> }
>> Ditto. Also, what's the synchronization rules for NUMA node
>> on/offlining. If you end up updating the mapping later, how would
>> that be synchronized against the above usages?
>
> I think the near online node map should be updated when node online/offline
> happens. But about this, I think the current numa code has a little
> problem.
>
> As you know, firmware info binds a set of CPUs and memory to a node. But
> at boot time, if the node has no memory (a memory-less node) , it won't
> be online.
> But the CPUs on that node is available, and bound to the near online node.
> (Here, I mean numa_set_node(cpu, node).)
>
> Why does the kernel do this ? I think it is used to ensure that we can
> allocate memory
> successfully by calling functions like alloc_pages_node() and
> alloc_pages_exact_node().
> By these two fuctions, any CPU should be bound to a node who has memory
> so that
> memory allocation can be successful.
>
> That means, for a memory-less node at boot time, CPUs on the node is
> online,
> but the node is not online.
>
> That also means, "the node is online" equals to "the node has memory".
> Actually, there
> are a lot of code in the kernel is using this rule.
>
>
> But,
> 1) in cpu_up(), it will try to online a node, and it doesn't check if
> the node has memory.
> 2) in try_offline_node(), it offlines CPUs first, and then the memory.
>
> This behavior looks a little wired, or let's say it is ambiguous. It
> seems that a NUMA node
> consists of CPUs and memory. So if the CPUs are online, the node should
> be online.
Hi Chen,
I have posted a patch set to enable memoryless node on x86,
will repost it for review:) Hope it help to solve this issue.
Thanks!
Gerry

>
> And also,
> The main purpose of this patch-set is to make the cpuid <-> nodeid
> mapping persistent.
> After this patch-set, alloc_pages_node() and alloc_pages_exact_node()
> won't depend on
> cpuid <-> nodeid mapping any more. So the node should be online if the
> CPUs on it are
> online. Otherwise, we cannot setup interfaces of CPUs under /sys.
>
>
> Unfortunately, since I don't have a machine a with memory-less node, I
> cannot reproduce
> the problem right now.
>
> How do you think the node online behavior should be changed ?
>
> Thanks.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/