On Mon 15-08-16 12:25:10, Robert Foss wrote:
[...]
On 2016-08-15 09:42 AM, Michal Hocko wrote:
The use case is to speed up monitoring of
memory consumption in environments where RSS isn't precise.
For example Chrome tends to many processes which have hundreds of VMAs
with a substantial amount of shared memory, and the error of using
RSS rather than PSS tends to be very large when looking at overall
memory consumption. PSS isn't kept as a single number that's exported
like RSS, so to calculate PSS means having to parse a very large smaps
file.
This process is slow and has to be repeated for many processes, and we
found that the just act of doing the parsing was taking up a
significant amount of CPU time, so this patch is an attempt to make
that process cheaper.
Well, this is slow because it requires the pte walk otherwise you cannot
know how many ptes map the particular shared page. Your patch
(totmaps_proc_show) does the very same page table walk because in fact
it is unavoidable. So what exactly is the difference except for the
userspace parsing which is quite trivial e.g. my currently running Firefox
has
$ awk '/^[0-9a-f]/{print}' /proc/4950/smaps | wc -l
984
quite some VMAs, yet parsing it spends basically all the time in the kernel...
$ /usr/bin/time -v awk '/^Rss/{rss+=$2} /^Pss/{pss+=$2} END {printf "rss:%d pss:%d\n", rss, pss}' /proc/4950/smaps
rss:1112288 pss:1096435
Command being timed: "awk /^Rss/{rss+=$2} /^Pss/{pss+=$2} END {printf "rss:%d pss:%d\n", rss, pss} /proc/4950/smaps"
User time (seconds): 0.00
System time (seconds): 0.02
Percent of CPU this job got: 91%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.02
So I am not really sure I see the performance benefit.