Re: [PATCH V4] mlx4_core: allocate ICM memory in page size chunks

From: Qing Huang
Date: Tue Jun 05 2018 - 14:51:41 EST




On 6/4/2018 5:40 AM, Vlastimil Babka wrote:
On 06/04/2018 08:27 AM, Michal Hocko wrote:
On Fri 01-06-18 15:05:26, Qing Huang wrote:

On 6/1/2018 12:31 AM, Michal Hocko wrote:
On Thu 31-05-18 19:04:46, Qing Huang wrote:
On 5/31/2018 2:10 AM, Michal Hocko wrote:
On Thu 31-05-18 10:55:32, Michal Hocko wrote:
On Thu 31-05-18 04:35:31, Eric Dumazet wrote:
[...]
I merely copied/pasted from alloc_skb_with_frags() :/
I will have a look at it. Thanks!
OK, so this is an example of an incremental development ;).

__GFP_NORETRY was added by ed98df3361f0 ("net: use __GFP_NORETRY for
high order allocations") to prevent from OOM killer. Yet this was
not enough because fb05e7a89f50 ("net: don't wait for order-3 page
allocation") didn't want an excessive reclaim for non-costly orders
so it made it completely NOWAIT while it preserved __GFP_NORETRY in
place which is now redundant. Should I send a patch?

Just curious, how about GFP_ATOMIC flag? Would it work in a similar fashion?
We experimented
with it a bit in the past but it seemed to cause other issue in our tests.
:-)
GFP_ATOMIC is a non-sleeping (aka no reclaim) context with an access to
memory reserves. So the risk is that you deplete those reserves and
cause issues to other subsystems which need them as well.

By the way, we didn't encounter any OOM killer events. It seemed that the
mlx4_alloc_icm() triggered slowpath.
We still had about 2GB free memory while it was highly fragmented.
The compaction was able to make a reasonable forward progress for you.
But considering mlx4_alloc_icm is called with GFP_KERNEL resp. GFP_HIGHUSER
then the OOM killer is clearly possible as long as the order is lower
than 4.
The allocation was 256KB so the order was much higher than 4. The compaction
seemed to be the root
cause for our problem. It took too long to finish its work while putting
mlx4_alloc_icm to sleep in a heavily
fragmented memory situation . Will NORETRY flag avoid the compaction ops and
fail the 256KB allocation
immediately so mlx4_alloc_icm can enter adjustable lower order allocation
code path quickly?
Costly orders should only perform a light compaction attempt unless
__GFP_RETRY_MAY_FAIL is used IIRC. CCing Vlastimil. So __GFP_NORETRY
shouldn't make any difference.
It's a bit more complicated. Costly allocations will try the light
compaction attempt first, even before reclaim. This is followed by
reclaim and a more costly compaction attempt. With __GFP_NORETRY, the
second compaction attempt is also only the light one, so the flag does
make a difference here.

Thanks for the clarification!

Looks like our production kernel is kinda old, neither __GFP_DIRECT_RECLAIM nor __GFP_NORETRY
has been used in __alloc_pages_slowpath() in our kernel.