Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

From: Michal Hocko
Date: Wed May 17 2017 - 05:21:10 EST


On Sun 30-04-17 16:33:10, Cristopher Lameter wrote:
> On Wed, 26 Apr 2017, Vlastimil Babka wrote:
>
> > > Such an application typically already has such logic and executes a
> > > binding after discovering its numa node configuration on startup. It would
> > > have to be modified to redo that action when it gets some sort of a signal
> > > from the script telling it that the node config would be changed.
> > >
> > > Having this logic in the application instead of the kernel avoids all the
> > > kernel messes that we keep on trying to deal with and IMHO is much
> > > cleaner.
> >
> > That would be much simpler for us indeed. But we still IMHO can't
> > abruptly start denying page fault allocations for existing applications
> > that don't have the necessary awareness.
>
> We certainly can do that. The failure of the page faults are due to the
> admin trying to move an application that is not aware of this and is using
> mempols. That could be an error. Trying to move an application that
> contains both absolute and relative node numbers is definitely something
> that is potentiall so screwed up that the kernel should not muck around
> with such an app.
>
> Also user space can determine if the application is using memory policies
> and can then take appropriate measures (message to the sysadmin to eval
> tge situation f.e.) or mess aroud with the processes memory policies on
> its own.
>
> So this is certainly a way out of this mess.

So how are you going to distinguish VM_FAULT_OOM from an empty mempolicy
case in a raceless way?

--
Michal Hocko
SUSE Labs