Re: [dm-devel] [4.7.0rc6] Page Allocation Failures with dm-crypt

From: Matthias Dahl
Date: Mon Jul 11 2016 - 10:47:26 EST

Hello Mike...

On 2016-07-11 15:30, Mike Snitzer wrote:

But that is expected given you're doing an unbounded buffered write to
the device. What isn't expected, to me anyway, is that the mm subsystem
(or the default knobs for buffered writeback) would be so aggressive
about delaying writeback.

Ok. But, and please correct me if I am wrong, I was under the impression
that only the file caches/buffers were affected, iow, if I use free to
monitor the memory usage, the used memory increases to the point where it
consumes all memory, not the buffers/file caches... that is what I am
seeing here.

Also, if I use dd directly on the device w/o dm-crypt in-between, there
is no problem. Sure, buffers increase hugely also... but only those.

Why are you doing this test anyway? Such a large buffered write doesn't
seem to accurately model any application I'm aware of (but obviously it
should still "work").

It is not a test per se. I simply wanted to fill the partition with noise.
And doing it this way is faster than using urandom or anything. ;-) That is
why I stumbled over this issue in the first place.

Now that is weird. Are you (or the distro you're using) setting any mm
subsystem tunables to really broken values?

You can see those in my initial mail. I attached the kernel warnings, all
sysctl tunables and more. Maybe that helps.

What is your raid10's full stripesize?

4 disks in RAID10, with a stripe size of 64k.

Is your dd IO size of 512K somehow triggering excess R-M-W cycles which
is exacerbating the problem?

The partitions are properly aligned. And as you can see, with that stripe
size, there is no issue.

In the meantime I did some further tests: I created an ext2 on the
partition as well as a 60GiB container image on it. I used that image
with dm-crypt, same parameters as before. No matter what I do here, I
cannot trigger the same behavior.

Maybe it is an interaction issue between dm-crypt and the s/w RAID. But
at this point, I have no idea how to further diagnose/test it. If you
can point me in any direction that would be great...

With Kind Regards from Germany

Dipl.-Inf. (FH) Matthias Dahl | Software Engineer |
services: custom software [desktop, mobile, web], server administration