Re: [PATCH 0/6 v3] kvmalloc

From: Daniel Borkmann
Date: Fri Jan 27 2017 - 15:58:19 EST


On 01/27/2017 11:05 AM, Michal Hocko wrote:
On Thu 26-01-17 21:34:04, Daniel Borkmann wrote:
On 01/26/2017 02:40 PM, Michal Hocko wrote:
[...]
But realistically, how big is this problem really? Is it really worth
it? You said this is an admin only interface and admin can kill the
machine by OOM and other means already.

Moreover and I should probably mention it explicitly, your d407bd25a204b
reduced the likelyhood of oom for other reason. kmalloc used GPF_USER
previously and with order > 0 && order <= PAGE_ALLOC_COSTLY_ORDER this
could indeed hit the OOM e.g. due to memory fragmentation. It would be
much harder to hit the OOM killer from vmalloc which doesn't issue
higher order allocation requests. Or have you ever seen the OOM killer
pointing to the vmalloc fallback path?

The case I was concerned about was from vmalloc() path, not kmalloc().
That was where the stack trace indicating OOM pointed to. As an example,
there could be really large allocation requests for maps where the map
has pre-allocated memory for its elements. Thus, if we get to the point
where we need to kill others due to shortage of mem for satisfying this,
I'd much much rather prefer to just not let vmalloc() work really hard
and fail early on instead.

I see, but as already mentioned, chances are that by the time you get
close to the OOM somebody else will hit the OOM before the vmalloc path
manages to free the allocated memory.

In my (crafted) test case, I was connected
via ssh and it each time reliably killed my connection, which is really
suboptimal.

F.e., I could also imagine a buggy or miscalculated map definition for
a prog that is provisioned to multiple places, which then accidentally
triggers this. Or if large on purpose, but we crossed the line, it
could be handled more gracefully, f.e. I could imagine an option to
falling back to a non-pre-allocated map flavor from the application
loading the program. Trade-off for sure, but still allowing it to
operate up to a certain extend. Granted, if vmalloc() succeeded without
trying hard and we then OOM elsewhere, too bad, but we don't have much
control over that one anyway, only about our own request. Reason I
asked above was whether having __GFP_NORETRY in would be fatal
somewhere down the path, but seems not as you say.

So to answer your second email with the bpf and netfilter hunks, why
not replacing them with kvmalloc() and __GFP_NORETRY flag and add that
big fat FIXME comment above there, saying explicitly that __GFP_NORETRY
is not harmful though has only /partial/ effect right now and that full
support needs to be implemented in future. That would still be better
that not having it, imo, and the FIXME would make expectations clear
to anyone reading that code.

Well, we can do that, I just would like to prevent from this (ab)use
if there is no _real_ and _sensible_ usecase for it. Having a real bug

Understandable.

report or a fallback mechanism you are mentioning above would justify
the (ab)use IMHO. But that abuse would be documented properly and have a
real reason to exist. That sounds like a better approach to me.

But if you absolutely _insist_ I can change that.

Yeah, please do (with a big FIXME comment as mentioned), this originally
came from a real bug report. Anyway, feel free to add my Acked-by then.

Thanks again,
Daniel