[PATCH v2 4/5] mm/madvise: allow KSM hints for remote API

From: Minchan Kim
Date: Thu Jan 16 2020 - 19:00:17 EST


From: Oleksandr Natalenko <oleksandr@xxxxxxxxxx>

It all began with the fact that KSM works only on memory that is marked
by madvise(). And the only way to get around that is to either:

* use LD_PRELOAD; or
* patch the kernel with something like UKSM or PKSM.

(i skip ptrace can of worms here intentionally)

To overcome this restriction, lets employ a new remote madvise API. This
can be used by some small userspace helper daemon that will do auto-KSM
job for us.

I think of two major consumers of remote KSM hints:

* hosts, that run containers, especially similar ones and especially in
a trusted environment, sharing the same runtime like Node.js;

* heavy applications, that can be run in multiple instances, not
limited to opensource ones like Firefox, but also those that cannot be
modified since they are binary-only and, maybe, statically linked.

Speaking of statistics, more numbers can be found in the very first
submission, that is related to this one [1]. For my current setup with
two Firefox instances I get 100 to 200 MiB saved for the second instance
depending on the amount of tabs.

1 FF instance with 15 tabs:

$ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc
410

2 FF instances, second one has 12 tabs (all the tabs are different):

$ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc
592

At the very moment I do not have specific numbers for containerised
workload, but those should be comparable in case the containers share
similar/same runtime.

[1] https://lore.kernel.org/patchwork/patch/1012142/

Signed-off-by: Oleksandr Natalenko <oleksandr@xxxxxxxxxx>
Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>
---
mm/madvise.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/mm/madvise.c b/mm/madvise.c
index 84cffd0900f1..89557998d287 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1000,6 +1000,8 @@ process_madvise_behavior_valid(int behavior)
switch (behavior) {
case MADV_COLD:
case MADV_PAGEOUT:
+ case MADV_MERGEABLE:
+ case MADV_UNMERGEABLE:
return true;
default:
return false;
--
2.25.0.rc1.283.g88dfdc4193-goog