[bug, 4.8] /proc/meminfo: counter values are very wrong

From: Dave Chinner
Date: Thu Aug 04 2016 - 01:11:48 EST


Hi folks,

I just noticed a whacky memory usage profile when running some basic
IO tests on a current 4.8 tree. It looked like there was a massive
memory leak from my monitoring graphs - doing buffered IO was
causing huge amounts of memory to be considered used, but the cache
size was not increasing.

Looking at /proc/meminfo:

$ cat /proc/meminfo
MemTotal: 16395408 kB
MemFree: 79424 kB
MemAvailable: 2497240 kB
Buffers: 4372 kB
Cached: 558744 kB
SwapCached: 48 kB
Active: 2127212 kB
Inactive: 100400 kB
Active(anon): 25348 kB
Inactive(anon): 79424 kB
Active(file): 2101864 kB
Inactive(file): 20976 kB
Unevictable: 13612980 kB <<<<<<<<<
Mlocked: 3516 kB
SwapTotal: 497976 kB
SwapFree: 497188 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 38784 kB
Mapped: 15880 kB
Shmem: 8808 kB
Slab: 460408 kB
SReclaimable: 428496 kB
SUnreclaim: 31912 kB
KernelStack: 6112 kB
PageTables: 6740 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 8695680 kB
Committed_AS: 177456 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 14204 kB
DirectMap2M: 16762880 kB

It seems that whatever was happening was causing unevictable memory
to pile up. But when I look at the per-node stats, the memory is all
accounted as active file pages:

$ cat /sys/bus/node/devices/node0/meminfo
Node 0 MemTotal: 4029052 kB
Node 0 MemFree: 33276 kB
Node 0 MemUsed: 3995776 kB
Node 0 Active: 3283280 kB
Node 0 Inactive: 580668 kB
Node 0 Active(anon): 3564 kB
Node 0 Inactive(anon): 4716 kB
Node 0 Active(file): 3279716 kB <<<<<<<<
Node 0 Inactive(file): 575952 kB
Node 0 Unevictable: 1648 kB
Node 0 Mlocked: 1648 kB
Node 0 Dirty: 8 kB
Node 0 Writeback: 0 kB
Node 0 FilePages: 78796 kB
Node 0 Mapped: 5540 kB
Node 0 AnonPages: 8020 kB
Node 0 Shmem: 256 kB
Node 0 KernelStack: 2352 kB
Node 0 PageTables: 1976 kB
Node 0 NFS_Unstable: 0 kB
Node 0 Bounce: 0 kB
Node 0 WritebackTmp: 0 kB
Node 0 Slab: 109012 kB
Node 0 SReclaimable: 99156 kB
Node 0 SUnreclaim: 9856 kB
Node 0 HugePages_Total: 0
Node 0 HugePages_Free: 0
Node 0 HugePages_Surp: 0
$ cat /sys/bus/node/devices/node1/meminfo
Node 1 MemTotal: 4127912 kB
Node 1 MemFree: 13888 kB
Node 1 MemUsed: 4114024 kB
Node 1 Active: 3455400 kB
Node 1 Inactive: 522156 kB
Node 1 Active(anon): 5556 kB
Node 1 Inactive(anon): 6784 kB
Node 1 Active(file): 3449844 kB
Node 1 Inactive(file): 515372 kB
Node 1 Unevictable: 52 kB
Node 1 Mlocked: 52 kB
Node 1 Dirty: 16 kB
Node 1 Writeback: 0 kB
Node 1 FilePages: 155684 kB
Node 1 Mapped: 2216 kB
Node 1 AnonPages: 12320 kB
Node 1 Shmem: 16 kB
Node 1 KernelStack: 720 kB
Node 1 PageTables: 1120 kB
Node 1 NFS_Unstable: 0 kB
Node 1 Bounce: 0 kB
Node 1 WritebackTmp: 0 kB
Node 1 Slab: 117340 kB
Node 1 SReclaimable: 111472 kB
Node 1 SUnreclaim: 5868 kB
Node 1 HugePages_Total: 0
Node 1 HugePages_Free: 0
Node 1 HugePages_Surp: 0
$ cat /sys/bus/node/devices/node2/meminfo
Node 2 MemTotal: 4127912 kB
Node 2 MemFree: 21308 kB
Node 2 MemUsed: 4106604 kB
Node 2 Active: 3453056 kB
Node 2 Inactive: 517824 kB
Node 2 Active(anon): 3224 kB
Node 2 Inactive(anon): 4356 kB
Node 2 Active(file): 3449832 kB
Node 2 Inactive(file): 513468 kB
Node 2 Unevictable: 556 kB
Node 2 Mlocked: 556 kB
Node 2 Dirty: 0 kB
Node 2 Writeback: 0 kB
Node 2 FilePages: 150120 kB
Node 2 Mapped: 1840 kB
Node 2 AnonPages: 7476 kB
Node 2 Shmem: 232 kB
Node 2 KernelStack: 1184 kB
Node 2 PageTables: 1360 kB
Node 2 NFS_Unstable: 0 kB
Node 2 Bounce: 0 kB
Node 2 WritebackTmp: 0 kB
Node 2 Slab: 114288 kB
Node 2 SReclaimable: 107616 kB
Node 2 SUnreclaim: 6672 kB
Node 2 HugePages_Total: 0
Node 2 HugePages_Free: 0
Node 2 HugePages_Surp: 0
$ cat /sys/bus/node/devices/node3/meminfo
Node 3 MemTotal: 4110532 kB
Node 3 MemFree: 10224 kB
Node 3 MemUsed: 4100308 kB
Node 3 Active: 3442224 kB
Node 3 Inactive: 506564 kB
Node 3 Active(anon): 8636 kB
Node 3 Inactive(anon): 9492 kB
Node 3 Active(file): 3433588 kB
Node 3 Inactive(file): 497072 kB
Node 3 Unevictable: 1260 kB
Node 3 Mlocked: 1260 kB
Node 3 Dirty: 0 kB
Node 3 Writeback: 0 kB
Node 3 FilePages: 178564 kB
Node 3 Mapped: 6284 kB
Node 3 AnonPages: 10968 kB
Node 3 Shmem: 8304 kB
Node 3 KernelStack: 1856 kB
Node 3 PageTables: 2284 kB
Node 3 NFS_Unstable: 0 kB
Node 3 Bounce: 0 kB
Node 3 WritebackTmp: 0 kB
Node 3 Slab: 119736 kB
Node 3 SReclaimable: 110252 kB
Node 3 SUnreclaim: 9484 kB
Node 3 HugePages_Total: 0
Node 3 HugePages_Free: 0
Node 3 HugePages_Surp: 0

So clearly there's an accounting problem here. I think there may be
multiple problems, however. The workload is simple:

$ time for i in `seq 0 1 100`; do
> sudo rm -f /mnt/scratch/testfile
> sudo xfs_io -f -c "pwrite 0 512m -b 32k" -c "pwrite 0 511m -b 32k" /mnt/scratch/testfile &> /dev/null
> done

It's just writing 512MB to a file twice, then unlinking it. Then
doing it again. On unlink, the page cache is invalidated, which means
all the 512MB of cached pages should be freed and removed from teh
page cache. According to the per-node counters, that is not
happening and there gigabytes of invalidated pages still sitting on
the active LRUs.

Something is broken....

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx