Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

From: Christoph Lameter
Date: Tue Apr 11 2017 - 13:24:39 EST


On Tue, 11 Apr 2017, Vlastimil Babka wrote:

> The root of the problem is that the cpuset's mems_allowed and mempolicy's
> nodemask can temporarily have no intersection, thus get_page_from_freelist()
> cannot find any usable zone. The current semantic for empty intersection is to
> ignore mempolicy's nodemask and honour cpuset restrictions. This is checked in
> node_zonelist(), but the racy update can happen after we already passed the

The fallback was only intended for a cpuset on which boundaries are not enforced
in critical conditions (softwall). A hardwall cpuset (CS_MEM_HARDWALL)
should fail the allocation.

> This patch fixes the issue by having __alloc_pages_slowpath() check for empty
> intersection of cpuset and ac->nodemask before OOM or allocation failure. If
> it's indeed empty, the nodemask is ignored and allocation retried, which mimics
> node_zonelist(). This works fine, because almost all callers of

Well that would need to be subject to the hardwall flag. Allocation needs
to fail for a hardwall cpuset.