committed memory, mmaps and shms

From: Marcos Dione
Date: Wed Mar 11 2015 - 14:19:45 EST

Hi everybody.
questions; I searched in the list of lists[1] for a MM specific one, but
didn't find any. Second, I'm not subscribed, so please CC me and my other
address when answering.

I'm trying to figure out how Linux really accounts for memory, both
globally and for each individual process. Most user's first approach to
memory monitoring is running free (no pun intended):

$ free
total used free shared buffers cached
Mem: 396895176 395956332 938844 0 8972 356409952
-/+ buffers/cache: 39537408 357357768
Swap: 8385788 8385788 0

This reports 378GiB of RAM, 377 used; of those 8MiB in buffers,
339GiB in cache, leaving only 38Gib for processes (for some reason this
value is not displayed, which should probably be a warning to what is to
come); and 1GiB free. So far all seems good.

Now, this machine has (at least) a 108 GiB shm. All this memory is
clearly counted as cache. This is my first surprise. shms are not cache
of anything on disk, but spaces of shared memory (duh); at most, their
pages can end up in swap, but not in a file somewhere. Maybe I'm not
correctly interpreting the meaning of (what is accounted as) cache.

The next tool in the toolbox is ps:

$ ps ux | grep 27595
osysops 27595 49.5 12.7 5506723020 50525312 ? Sl 05:20 318:02 otf_be v2.9.0.13 : FQ_E08AS FQ_E08-FQDSIALT #1 [processing daemon lib, msg type: undefined]

This process is not only attached to that shm, it's also attached to
5TiB of mmap'ed files (128 LMDB databases), for a total of 5251GiB. For
context, know that another 9 processes do the same. This tells me that
shms and mmaps are counted as part of their virtual size, which makes
sense. Of those, only 48GiB are resident... but a couple of paragraphs
before I said that there were only 38GiB used by processes. Clearly some
part of each individual process' RSS also counts at least some part of
the mmaps. /proc/27595/smaps has more info:

$ cat /proc/27595/smaps | awk 'BEGIN { count= 0; } /Rss/ { count = count + $2; print } /Pss/ { print } /Swap/ { print } /^Size/ { print } /-/ { print } END { print count }'
7f2987e92000-7f3387e92000 rw-s 00000000 fc:11 3225448420 /instant/LMDBMedium_0000000000/data.mdb
Size: 41943040 kB
Rss: 353164 kB
Pss: 166169 kB
Swap: 0 kB
7f33df965000-7f4f1cdcc000 rw-s 00000000 00:04 454722576 /SYSV00000000 (deleted)
Size: 114250140 kB
Rss: 5587224 kB
Pss: 3856206 kB
Swap: 0 kB

Notice that the sum is not the same as the one reported before; maybe
because I took them in different points of time while redacting this
mail. So this confirms that a process' RSS value includes shms and mmaps,
at least the resident part. In the case of the mmaps, the resident part
must be the part that currently sits on the cache; in the case of the
shms, I suppose it's the part that has ever been used. An internal tool
tels me that currently 24GiB of that shm is in use, but only 5 are
reported as part of that process' RSS. Maybe is that process' used part?

And now I reach to what I find more confusing (uninteresting values

$ cat /proc/meminfo
MemTotal: 396895176 kB
MemFree: 989392 kB
Buffers : 8448 kB
Cached: 344059556 kB
SwapTotal: 8385788 kB
SwapFree: 0 kB
Mapped: 147188944 kB
Shmem: 109114792 kB
CommitLimit: 206833376 kB
Committed_AS: 349194180 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 1222960 kB
VmallocChunk: 34157188704 kB

Again, values might vary due to timing. Mapped clearly includes Shmem
but not mmaps; in theory 36GiB are 'pure' (not shm'ed, not mmap'ed)
process memory, close to what I calculated before. Again, this is not
segregated, which again makes us wonder why. Probably it's more like "It
doesn't make sense to do it".

Last but definitely not least, Committed_AS is 333GiB, close to the
total mem. man proc says it's «The amount of memory presently allocated
on the system. The committed memory is a sum of all of the memory which
has been allocated by processes, even if it has not been "used" by them
as of yet». What is not clear is if this counts or not mmaps (I think it
doesn't, or it would be either 5TiB or 50TiB, depending on whether you
count each attachment to each shm) and/or/neither shms (once, multiple
times?). In a rough calculation, the 83 procs using the same 108GiB shm
account for 9TiB, so at least it's not counting it multiple times.

While we're at it, I would like to know what VmallocTotal (32TiB) is
accounting. The explanation in man proc («Total size of vmalloc memory
area.», where vmalloc seems to be a kernel internal function to «allocate
a contiguous memory region in the virtual address space») means not much
for me. At some point I thought it should be the sum of all VSSs, but
that clocks at 50TiB for me, so it isn't. Maybe I should just ignore it.

Short version:

* Why 'pure' mmalloc'ed memory is ever reported? Does it make sense to
talk about it?

* Why shms shows up in cache? What does cache currently mean/hold?

* What does the RSS value means for the shms in each proc's smaps file?
And for mmaps?

* Is my conclusion about Shmem being counted into Mapped correct?

* What is actually counted in Committed_AS? Does it count shms or mmaps?

* What is VmallocTotal?

Thanks in advance, first for reaching the end of this longish mail,
and second if you ever give any clues about any of these questions.

-- Marcos.

