Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

From: Matthias Dahl
Date: Tue Jul 12 2016 - 04:27:50 EST


Hello,

I posted this issue already on linux-mm, linux-kernel and dm-devel a
few days ago and after further investigation it seems like that this
issue is somehow related to the fact that I am using an Intel Rapid
Storage RAID10, so I am summarizing everything again in this mail
and include linux-raid in my post. Sorry for the noise... :(

I am currently setting up a new machine (since my old one broke down)
and I ran into a lot of " Unable to allocate memory on node -1" warnings
while using dm-crypt. I have attached as much of the full log as I could
recover.

The encrypted device is sitting on a RAID10 (software raid, Intel Rapid
Storage). I am currently limited to testing via Linux live images since
the machine is not yet properly setup but I did my tests across several
of those.

Steps to reproduce are:

1)
cryptsetup -s 512 -d /dev/urandom -c aes-xts-plain64 open --type plain /dev/md126p5 test-device

2)
dd if=/dev/zero of=/dev/mapper/test-device status=progress bs=512K

While running and monitoring the memory usage with free, it can be seen
that the used memory increases rapidly and after just a few seconds, the
system is out of memory and page allocation failures start to be issued
as well as the OOM killer gets involved.

I have also seen this behavior with mkfs.ext4 being used on the same
device -- at least with 1.43.1.

Using direct i/o will work fine and not cause any issue. Also if dm-crypt
is out of the picture, the problem does also not occur.

I did further tests:

1) dd block size has no influence on the issue whatsoever
2) using dm-crypt on an image located on an ext2 on the RAID10 works
fine
3) using an external (connected through USB3) hd with two partitions
and using either a RAID1 or RAID10 on it via Linux s/w RAID with
dm-crypt on-top, does also work fine

But as soon as I use dm-crypt on the Intel Rapid Storage RAID10, the
issue is 100% reproducible.

I tested all of this on a Fedora Rawhide Live Image as I currently still am
in the process of setting the new machine up. Those images are available
here to download:

download.fedoraproject.org/pub/fedora/linux/development/rawhide/Workstation/x86_64/iso/

The machine itself has 32 GiB of RAM (plenty), no swap (live image)
and is a 6700k on a Z170 chipset. The kernel is the default provided
with the live image... right now that is a very recent git after
4.7.0rc6 but before rc7. But the issue also shows on 4.4.8 and 4.5.5.

The stripe size of the RAID10 is 64k, if that matters.

I am now pretty much out of ideas what else to test and where the problem
could stem from. Suffice to say that this has impacted my trust in this
particular setup. I hope I can help to find the cause of this.

If there is anything I can do to help, please let me know.

Also, since I am not subscribed to the lists right now (I have to make due
with a crappy WebMail interface until everything is setup), please cc' me
accordingly. Thanks a lot.

With Kind Regards from Germany,
Matthias

--
Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu
services: custom software [desktop, mobile, web], server administrationPersonalities : [raid10]
md126 : active raid10 sda[3] sdb[2] sdc[1] sdd[0]
3907023872 blocks super external:/md127/0 64K chunks 2 near-copies [4/4] [UUUU]

md127 : inactive sdc[3](S) sdb[2](S) sda[1](S) sdd[0](S)
10064 blocks super external:imsm

unused devices: <none>
nr_free_pages 7943696
nr_alloc_batch 5873
nr_inactive_anon 296
nr_active_anon 105347
nr_inactive_file 29921
nr_active_file 56980
nr_unevictable 13140
nr_mlock 2716
nr_anon_pages 107204
nr_mapped 27670
nr_file_pages 98500
nr_dirty 14
nr_writeback 0
nr_slab_reclaimable 8887
nr_slab_unreclaimable 16975
nr_page_table_pages 7137
nr_kernel_stack 490
nr_unstable 0
nr_bounce 0
nr_vmscan_write 9828326
nr_vmscan_immediate_reclaim 67360474
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 593
nr_dirtied 506466654
nr_written 506466595
nr_pages_scanned 0
numa_hit 5258670960
numa_miss 0
numa_foreign 0
numa_interleave 38217
numa_local 5258670960
numa_other 0
workingset_refault 336993
workingset_activate 61553
workingset_nodereclaim 7919435
nr_anon_transparent_hugepages 0
nr_free_cma 0
nr_dirty_threshold 1592267
nr_dirty_background_threshold 795161
pgpgin 10489537
pgpgout 2025884337
pswpin 0
pswpout 0
pgalloc_dma 684558
pgalloc_dma32 328009673
pgalloc_normal 5767713958
pgalloc_movable 0
pgfree 6106436104
pgactivate 813221
pgdeactivate 1284082
pgfault 1653795
pgmajfault 46351
pglazyfreed 0
pgrefill_dma 0
pgrefill_dma32 66114
pgrefill_normal 1407169
pgrefill_movable 0
pgsteal_kswapd_dma 0
pgsteal_kswapd_dma32 22181873
pgsteal_kswapd_normal 425875886
pgsteal_kswapd_movable 0
pgsteal_direct_dma 0
pgsteal_direct_dma32 10723905
pgsteal_direct_normal 45330060
pgsteal_direct_movable 0
pgscan_kswapd_dma 0
pgscan_kswapd_dma32 32470709
pgscan_kswapd_normal 758168190
pgscan_kswapd_movable 0
pgscan_direct_dma 0
pgscan_direct_dma32 55064390
pgscan_direct_normal 449388285
pgscan_direct_movable 0
pgscan_direct_throttle 16
zone_reclaim_failed 0
pginodesteal 329
slabs_scanned 75784518
kswapd_inodesteal 3324
kswapd_low_wmark_hit_quickly 18086579
kswapd_high_wmark_hit_quickly 562
pageoutrun 18100603
allocstall 739928
pgrotated 357590082
drop_pagecache 0
drop_slab 0
numa_pte_updates 0
numa_huge_pte_updates 0
numa_hint_faults 0
numa_hint_faults_local 0
numa_pages_migrated 0
pgmigrate_success 562476
pgmigrate_fail 34076511
compact_migrate_scanned 390290706
compact_free_scanned 17609026156
compact_isolated 37387419
compact_stall 17
compact_fail 10
compact_success 7
compact_daemon_wake 3013752
htlb_buddy_alloc_success 0
htlb_buddy_alloc_fail 0
unevictable_pgs_culled 69728
unevictable_pgs_scanned 0
unevictable_pgs_rescued 57566
unevictable_pgs_mlocked 62928
unevictable_pgs_munlocked 59182
unevictable_pgs_cleared 18
unevictable_pgs_stranded 18
thp_fault_alloc 0
thp_fault_fallback 0
thp_collapse_alloc 0
thp_collapse_alloc_failed 0
thp_split_page 0
thp_split_page_failed 0
thp_deferred_split_page 0
thp_split_pmd 0
thp_zero_page_alloc 0
thp_zero_page_alloc_failed 0
balloon_inflate 0
balloon_deflate 0
balloon_migrate 0

Attachment: crypto.txt.gz
Description: GNU Zip compressed data

Attachment: kernel.log.txt.gz
Description: GNU Zip compressed data

Attachment: sysctl.txt.gz
Description: GNU Zip compressed data