Re: Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

From: Matthias Dahl
Date: Tue Jul 12 2016 - 08:42:20 EST


Hello Michal...

On 2016-07-12 13:49, Michal Hocko wrote:

I am not a storage expert (not even mention dm-crypt). But what those
counters say is that the IO completion doesn't trigger so the
PageWriteback flag is still set. Such a page is not reclaimable
obviously. So I would check the IO delivery path and focus on the
potential dm-crypt involvement if you suspect this is a contributing
factor.

Sounds reasonable... except that I have no clue how to trace that with
the limited means I have at my disposal right now and with the limited
knowledge I have of the kernel internals. ;-)

Who is consuming those objects? Where is the rest 70% of memory hiding?

Is there any way to get a more detailed listing of where the memory is
spent while dd is running? Something I could pipe every 500ms or so for
later analysis or so?

Writer will get throttled but the concurrent memory consumer will not
normally. So you can end up in this situation.

Hm, okay. I am still confused though: If I, for example, let dd do the
exact same thing on a raw partition on the RAID10, nothing like that
happens. Wouldn't we have the same race and problem then too...? It is
only with dm-crypt in-between that all of this shows itself. But I do
somehow suspect the RAID10 Intel Rapid Storage to be the cause or at
least partially.

Like I said, if you have any pointers how I could further trace this
or figure out who is exactly consuming what memory, that would be very
helpful... Thanks.

So long,
Matthias

--
Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu
services: custom software [desktop, mobile, web], server administration