Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers

From: John Hubbard
Date: Thu Jan 19 2017 - 03:38:07 EST




On 01/18/2017 12:21 AM, Michal Hocko wrote:
On Tue 17-01-17 21:59:13, John Hubbard wrote:

On 01/16/2017 11:51 PM, Michal Hocko wrote:
On Mon 16-01-17 13:57:43, John Hubbard wrote:


On 01/16/2017 01:48 PM, Michal Hocko wrote:
On Mon 16-01-17 13:15:08, John Hubbard wrote:


On 01/16/2017 11:40 AM, Michal Hocko wrote:
On Mon 16-01-17 11:09:37, John Hubbard wrote:


On 01/16/2017 12:47 AM, Michal Hocko wrote:
On Sun 15-01-17 20:34:13, John Hubbard wrote:
[...]
Is that "Reclaim modifiers" line still true, or is it a leftover from an
earlier approach? I am having trouble reconciling it with rest of the
patchset, because:

a) the flags argument below is effectively passed on to either kmalloc_node
(possibly adding, but not removing flags), or to __vmalloc_node_flags.

The above only says thos are _unsupported_ - in other words the behavior
is not defined. Even if flags are passed down to kmalloc resp. vmalloc
it doesn't mean they are used that way. Remember that vmalloc uses
some hardcoded GFP_KERNEL allocations. So while I could be really
strict about this and mask away these flags I doubt this is worth the
additional code.

I do wonder about passing those flags through to kmalloc. Maybe it is worth
stripping out __GFP_NORETRY and __GFP_NOFAIL, after all. It provides some
insulation from any future changes to the implementation of kmalloc, and it
also makes the documentation more believable.

I am not really convinced that we should take an extra steps for these
flags. There are no existing users for those flags and new users should
follow the documentation.

OK, let's just fortify the documentation ever so slightly, then, so that
users are more likely to do the right thing. How's this sound:

* Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. (Even
* though the current implementation passes the flags on through to kmalloc and
* vmalloc, that is done for efficiency and to avoid unnecessary code. The caller
* should not pass in these flags.)
*
* __GFP_REPEAT is supported, but only for large (>64kB) allocations.


? Or is that documentation overkill?

Dunno, it sounds like an overkill to me. It is telling more than
necessary. If we want to be so vocal about gfp flags then we would have
to say much more I suspect. E.g. what about __GFP_HIGHMEM? This flag is
supported for vmalloc while unsupported for kmalloc. I am pretty sure
there would be other gfp flags to consider and then this would grow
borringly large and uninteresting to the point when people simply stop
reading it. Let's just be as simple as possible.

Agreed, on the simplicity point: simple and clear is ideal. But here, it's
merely short, and not quite simple. :) People will look at that short bit
of documentation, and then notice that the flags are, in fact, all passed
right on through down to both kmalloc_node and __vmalloc_node_flags.

If you don't want too much documentation, then I'd be inclined to say
something higher-level, about the intent, rather than mentioning those two
flags directly. Because as it stands, the documentation contradicts what the
code does.

Feel free to suggest a better wording. I am, of course, open to any
changes.

OK, here's the best I've got, I tried to keep it concise, but (as you
suspected) I'm not sure it's actually any better than the original:

* Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL should not be passed in.
* Passing in __GFP_REPEAT is supported, but note that it is ignored for small
* (<=64KB) allocations, during the kmalloc attempt.

__GFP_REPEAT is fully
* honored for all allocation sizes during the second part: the vmalloc attempt.

this is not true to be really precise because vmalloc doesn't respect
the given gfp mask all the way down (look at the pte initialization).


I'm having some difficulty in locating that pte initialization part, am I on the wrong code path? Here's what I checked, before making the claim about __GFP_REPEAT being honored:

kvmalloc_node
__vmalloc_node_flags
__vmalloc_node
__vmalloc_node_range
__vmalloc_area_node
alloc_pages_node
__alloc_pages_node
__alloc_pages
__alloc_pages_nodemask
__alloc_pages_slowpath


...and __alloc_pages_slowpath does the __GFP_REPEAT handling:

/*
* Do not retry costly high order allocations unless they are
* __GFP_REPEAT
*/
if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
goto nopage;

thanks,
john h