Re: [PATCH RFC 0/4] mm/ksm: add option to automerge VMAs

From: Timofey Titovets
Date: Mon May 13 2019 - 07:50:39 EST


ÐÐ, 13 ÐÐÑ 2019 Ð. Ð 14:33, Oleksandr Natalenko <oleksandr@xxxxxxxxxx>:
>
> Hi.
>
> On Mon, May 13, 2019 at 01:38:43PM +0300, Kirill Tkhai wrote:
> > On 10.05.2019 10:21, Oleksandr Natalenko wrote:
> > > By default, KSM works only on memory that is marked by madvise(). And the
> > > only way to get around that is to either:
> > >
> > > * use LD_PRELOAD; or
> > > * patch the kernel with something like UKSM or PKSM.
> > >
> > > Instead, lets implement a so-called "always" mode, which allows marking
> > > VMAs as mergeable on do_anonymous_page() call automatically.
> > >
> > > The submission introduces a new sysctl knob as well as kernel cmdline option
> > > to control which mode to use. The default mode is to maintain old
> > > (madvise-based) behaviour.
> > >
> > > Due to security concerns, this submission also introduces VM_UNMERGEABLE
> > > vmaflag for apps to explicitly opt out of automerging. Because of adding
> > > a new vmaflag, the whole work is available for 64-bit architectures only.
> > >> This patchset is based on earlier Timofey's submission [1], but it doesn't
> > > use dedicated kthread to walk through the list of tasks/VMAs.
> > >
> > > For my laptop it saves up to 300 MiB of RAM for usual workflow (browser,
> > > terminal, player, chats etc). Timofey's submission also mentions
> > > containerised workload that benefits from automerging too.
> >
> > This all approach looks complicated for me, and I'm not sure the shown profit
> > for desktop is big enough to introduce contradictory vma flags, boot option
> > and advance page fault handler. Also, 32/64bit defines do not look good for
> > me. I had tried something like this on my laptop some time ago, and
> > the result was bad even in absolute (not in memory percentage) meaning.
> > Isn't LD_PRELOAD trick enough to desktop? Your workload is same all the time,
> > so you may statically insert correct preload to /etc/profile and replace
> > your mmap forever.
> >
> > Speaking about containers, something like this may have a sense, I think.
> > The probability of that several containers have the same pages are higher,
> > than that desktop applications have the same pages; also LD_PRELOAD for
> > containers is not applicable.
>
> Yes, I get your point. But the intention is to avoid another hacky trick
> (LD_PRELOAD), thus *something* should *preferably* be done on the
> kernel level instead.
>
> > But 1)this could be made for trusted containers only (are there similar
> > issues with KSM like with hardware side-channel attacks?!);
>
> Regarding side-channel attacks, yes, I think so. Were those openssl guys
> who complained about it?..
>
> > 2) the most
> > shared data for containers in my experience is file cache, which is not
> > supported by KSM.
> >
> > There are good results by the link [1], but it's difficult to analyze
> > them without knowledge about what happens inside them there.
> >
> > Some of tests have "VM" prefix. What the reason the hypervisor don't mark
> > their VMAs as mergeable? Can't this be fixed in hypervisor? What is the
> > generic reason that VMAs are not marked in all the tests?
>
> Timofey, could you please address this?

That's just a describe of machine,
only to show difference in deduplication for application in small VM
and real big server
i.e. KSM enabled in VM for containers, not for hypervisor.

> Also, just for the sake of another piece of stats here:
>
> $ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc
> 526

IIRC, for calculate saving you must use (pages_shared - pages_sharing)

> > In case of there is a fundamental problem of calling madvise, can't we
> > just implement an easier workaround like a new write-only file:
> >
> > #echo $task > /sys/kernel/mm/ksm/force_madvise
> >
> > which will mark all anon VMAs as mergeable for a passed task's mm?
> >
> > A small userspace daemon may write mergeable tasks there from time to time.
> >
> > Then we won't need to introduce additional vm flags and to change
> > anon pagefault handler, and the changes will be small and only
> > related to mm/ksm.c, and good enough for both 32 and 64 bit machines.
>
> Yup, looks appealing. Two concerns, though:
>
> 1) we are falling back to scanning through the list of tasks (I guess
> this is what we wanted to avoid, although this time it happens in the
> userspace);
>
> 2) what kinds of opt-out we should maintain? Like, what if force_madvise
> is called, but the task doesn't want some VMAs to be merged? This will
> required new flag anyway, it seems. And should there be another
> write-only file to unmerge everything forcibly for specific task?
>
> Thanks.
>
> P.S. Cc'ing Pavel properly this time.
>
> --
> Best regards,
> Oleksandr Natalenko (post-factum)
> Senior Software Maintenance Engineer



--
Have a nice day,
Timofey.