Re: [PATCH RFC 0/4] mm/ksm: add option to automerge VMAs

From: Oleksandr Natalenko
Date: Mon May 13 2019 - 07:34:52 EST


Hi.

On Mon, May 13, 2019 at 01:38:43PM +0300, Kirill Tkhai wrote:
> On 10.05.2019 10:21, Oleksandr Natalenko wrote:
> > By default, KSM works only on memory that is marked by madvise(). And the
> > only way to get around that is to either:
> >
> > * use LD_PRELOAD; or
> > * patch the kernel with something like UKSM or PKSM.
> >
> > Instead, lets implement a so-called "always" mode, which allows marking
> > VMAs as mergeable on do_anonymous_page() call automatically.
> >
> > The submission introduces a new sysctl knob as well as kernel cmdline option
> > to control which mode to use. The default mode is to maintain old
> > (madvise-based) behaviour.
> >
> > Due to security concerns, this submission also introduces VM_UNMERGEABLE
> > vmaflag for apps to explicitly opt out of automerging. Because of adding
> > a new vmaflag, the whole work is available for 64-bit architectures only.
> >> This patchset is based on earlier Timofey's submission [1], but it doesn't
> > use dedicated kthread to walk through the list of tasks/VMAs.
> >
> > For my laptop it saves up to 300 MiB of RAM for usual workflow (browser,
> > terminal, player, chats etc). Timofey's submission also mentions
> > containerised workload that benefits from automerging too.
>
> This all approach looks complicated for me, and I'm not sure the shown profit
> for desktop is big enough to introduce contradictory vma flags, boot option
> and advance page fault handler. Also, 32/64bit defines do not look good for
> me. I had tried something like this on my laptop some time ago, and
> the result was bad even in absolute (not in memory percentage) meaning.
> Isn't LD_PRELOAD trick enough to desktop? Your workload is same all the time,
> so you may statically insert correct preload to /etc/profile and replace
> your mmap forever.
>
> Speaking about containers, something like this may have a sense, I think.
> The probability of that several containers have the same pages are higher,
> than that desktop applications have the same pages; also LD_PRELOAD for
> containers is not applicable.

Yes, I get your point. But the intention is to avoid another hacky trick
(LD_PRELOAD), thus *something* should *preferably* be done on the
kernel level instead.

> But 1)this could be made for trusted containers only (are there similar
> issues with KSM like with hardware side-channel attacks?!);

Regarding side-channel attacks, yes, I think so. Were those openssl guys
who complained about it?..

> 2) the most
> shared data for containers in my experience is file cache, which is not
> supported by KSM.
>
> There are good results by the link [1], but it's difficult to analyze
> them without knowledge about what happens inside them there.
>
> Some of tests have "VM" prefix. What the reason the hypervisor don't mark
> their VMAs as mergeable? Can't this be fixed in hypervisor? What is the
> generic reason that VMAs are not marked in all the tests?

Timofey, could you please address this?

Also, just for the sake of another piece of stats here:

$ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc
526

> In case of there is a fundamental problem of calling madvise, can't we
> just implement an easier workaround like a new write-only file:
>
> #echo $task > /sys/kernel/mm/ksm/force_madvise
>
> which will mark all anon VMAs as mergeable for a passed task's mm?
>
> A small userspace daemon may write mergeable tasks there from time to time.
>
> Then we won't need to introduce additional vm flags and to change
> anon pagefault handler, and the changes will be small and only
> related to mm/ksm.c, and good enough for both 32 and 64 bit machines.

Yup, looks appealing. Two concerns, though:

1) we are falling back to scanning through the list of tasks (I guess
this is what we wanted to avoid, although this time it happens in the
userspace);

2) what kinds of opt-out we should maintain? Like, what if force_madvise
is called, but the task doesn't want some VMAs to be merged? This will
required new flag anyway, it seems. And should there be another
write-only file to unmerge everything forcibly for specific task?

Thanks.

P.S. Cc'ing Pavel properly this time.

--
Best regards,
Oleksandr Natalenko (post-factum)
Senior Software Maintenance Engineer