From: Mikulas Patocka
Date: Mon Jul 11 2016 - 11:43:15 EST

On Mon, 11 Jul 2016, Ondrej Kozina wrote:

On upstream mailing lists there have been reports of freezing systems
due to OOM. Ondra (on CC) managed to reproduce this inhouse, he'd like
someone with mm skills to look at the problem since he doesn't
understand why OOM comes into play when >90% of 2GB swap are still free.
> > > Could you please take a look? It's following this email on upstream:
> > >
> > >
> > Hi Ondrej,
> >
before the OOM kill, several of them are in memory reclaim path, which
prevents it to free memory.
> > prevents it to free memory.
satisfy atomic allocation (cf. /proc/sys/vm/min_free_kbytes). Have you
try to increase that value?
> > try to increase that value?
runs out. There might be a burst of atomic allocations that deplete the
reserve. What kind of workload is that?

Jerome
> > reserve. What kind of workload is that?
> Hi Jerome,
special. I've started gcc build of a project in C++ in 3-4 threads so that I'd
waste all physical memory to trigger it. I can build some simple utility to
allocate memory in predefined chunks in some loop if it'd of any help. It was
really quite simple to trigger this.
> really quite simple to trigger this.
PS: Adding Mikulas on CC'ed (dm-crypt upstream) in case he has anything to
add.
> PS: Adding Mikulas on CC'ed (dm-crypt upstream) in case he has anything to
> add.

That allocation warning in wb_start_writeback was already silenced by the
commit 78ebc2f7146156f488083c9e5a7ded9d5c38c58b. The warning in
drivers/virtio/virtio_ring.c:alloc_indirect could be silenced as well (the
driver does fallback in case of allocation failure, so this failure can't
result in loss of functionality).

The general problem is that the memory allocator does 16 retries to
allocate a page and then triggers the OOM killer (and it doesn't take into
account how much swap space is free or how many dirty pages were really
swapped out while it waited).

So, it could prematurely trigger OOM killer on any slow swapping device
(including dm-crypt). Michal Hocko reworked the OOM killer in the patch
0a0337e0d1d134465778a16f5cbea95086e8e9e0, but it still has the flaw that
it triggers OOM if there is plenty of free swap space free.

Michal, would you accept a change to the OOM killer, to prevent it from
triggerring when there is free swap space?