Re: [RFC 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically

From: Vlastimil Babka
Date: Thu Nov 24 2016 - 02:43:40 EST


On 11/23/2016 01:35 PM, Michal Hocko wrote:
On Wed 23-11-16 13:19:20, Vlastimil Babka wrote:
This makes some sense to me, but there might be unpleasant consequences,
e.g. due to allowing costly allocations without reserves.

I am not sure I understand. Did you mean with reserves? Anyway, my code

Yeah, with reserves/without watermarks checks. Sorry.

inspection shown that we are not really doing GFP_NOFAIL for costly
orders. This might change in the future but even if we do that then this
shouldn't add a risk of the reserves depletion, right?

Well it's true that it will be unlikely that high-order pages will exist at min watermark, but if they do, high-order page depletes more than order-0. Anyway we have the WARN_ON_ONCE on cosly nofail allocations, so at least this won't happen silently...

I guess only testing will show...

Also some comments below.
[...]
static inline struct page *
+__alloc_pages_nowmark(gfp_t gfp_mask, unsigned int order,
+ const struct alloc_context *ac)
+{
+ struct page *page;
+
+ page = get_page_from_freelist(gfp_mask, order,
+ ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac);
+ /*
+ * fallback to ignore cpuset restriction if our nodes
+ * are depleted
+ */
+ if (!page)
+ page = get_page_from_freelist(gfp_mask, order,
+ ALLOC_NO_WATERMARKS, ac);

Is this enough? Look at what __alloc_pages_slowpath() does since
e46e7b77c909 ("mm, page_alloc: recalculate the preferred zoneref if the
context can ignore memory policies").

this is a one time attempt to do the nowmark allocation. If we need to
do the recalculation then this should happen in the next round. Or am I
missing your question?

The next round no-watermarks allocation attempt in __alloc_pages_slowpath() uses different criteria than the new __alloc_pages_nowmark() callers. And it would be nicer to unify this as well, if possible.



...

- }
/* Exhausted what can be done so it's blamo time */
- if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
+ if (out_of_memory(&oc)) {

This removes the warning, but also the check for __GFP_NOFAIL itself. Was it
what you wanted?

The point of the check was to keep looping for __GFP_NOFAIL requests
even when the OOM killer is disabled (out_of_memory returns false). We
are accomplishing that by

*did_some_progress = 1;
^^^^ this

But oom disabled means that this line is not reached?

it is true we will not have the warning but I am not really sure we care
all that much. In any case it wouldn't be all that hard to check for oom
killer disabled and warn on in the allocator slow path.

thanks for having a look!