Re: [PATCH -next v2] mm/hotplug: fix a null-ptr-deref during NUMA boot

From: Michal Hocko
Date: Tue May 28 2019 - 14:23:37 EST

[Sorry for a late reply]

On Thu 23-05-19 11:58:45, Pingfan Liu wrote:
> On Wed, May 22, 2019 at 7:16 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> >
> > On Wed 22-05-19 15:12:16, Pingfan Liu wrote:
> > > But in fact, we already have for_each_node_state(nid, N_MEMORY) to
> > > cover this purpose.
> >
> > I do not really think we want to spread N_MEMORY outside of the core MM.
> > It is quite confusing IMHO.
> > .
> But it has already like this. Just git grep N_MEMORY.

I might be wrong but I suspect a closer review would reveal that the use
will be inconsistent or dubious so following the existing users is not
the best approach.

> > > Furthermore, changing the definition of online may
> > > break something in the scheduler, e.g. in task_numa_migrate(), where
> > > it calls for_each_online_node.
> >
> > Could you be more specific please? Why should numa balancing consider
> > nodes without any memory?
> >
> As my understanding, the destination cpu can be on a memory less node.
> BTW, there are several functions in the scheduler facing the same
> scenario, task_numa_migrate() is an example.

Even if the destination node is memoryless then any migration would fail
because there is no memory. Anyway I still do not see how using online
node would break anything.

> > > By keeping the node owning cpu as online, Michal's patch can avoid
> > > such corner case and keep things easy. Furthermore, if needed, the
> > > other patch can use for_each_node_state(nid, N_MEMORY) to replace
> > > for_each_online_node is some space.
> >
> > Ideally no code outside of the core MM should care about what kind of
> > memory does the node really own. The external code should only care
> > whether the node is online and thus usable or offline and of no
> > interest.
> Yes, but maybe it will pay great effort on it.

Even if that is the case it would be preferable because the current
situation is just not sustainable wrt maintenance cost. It is just too
simple to break the existing logic as this particular report outlines.
Michal Hocko