Re: can't oom-kill zap the victim's memory?
From: Eric W. Biederman
Date: Tue Oct 06 2015 - 11:00:56 EST
Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:
> On Tue, Oct 6, 2015 at 9:49 AM, Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> The basic fact remains: kernel allocations are so important that
>> rather than fail, you should kill user space. Only kernel allocations
>> that *explicitly* know that they have fallback code should fail, and
>> they should just do the __GFP_NORETRY.
If you have reached the point of killing userspace you might as well
panic the box. Userspace will recover more cleanly and more quickly.
The oom-killer is like an oops. Nice for debugging but not something
you want on a production workload.
> To be clear: "big" orders (I forget if the limit is at order-3 or
> order-4) do fail much more aggressively. But no, we do not limit retry
> to just order-0, because even small kmalloc sizes tend to often do
> order-1 or order-2 just because of memory packing issues (ie trying to
> pack into a single page wastes too much memory if the allocation sizes
> don't come out right).
I am not asking that we limit retry to just order-0 pages. I am asking
that we limit the oom-killer on failure to just order-0 pages.
> So no, order-0 isn't special. 1/2 are rather important too.
That is a justification for retrying. That is not a justification for
killing the box.
> [ Checking /proc/slabinfo: it looks like several slabs are order-3,
> for things like files_cache, signal_cache and sighand_cache for me at
> least. So I think it's up to order-3 that we basically need to
> consider "we'll need to shrink user space aggressively unless we have
> an explicit fallback for the allocation" ]
What I know is that order-3 is definitely too big. I had 4G of RAM
free. I needed 16K to exapand the fd table. The box died. That is
not good.
We have static checkers now, failure to check and handle errors tends to
be caught.
So yes for the rare case of order-[123] allocations failing we should
return the failure to the caller. The kernel can handle it. Userspace
can handle just about anything better than random processes dying.
Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/