Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
From: Andrea Arcangeli
Date: Wed May 06 2009 - 09:37:33 EST
On Wed, May 06, 2009 at 12:16:52PM +0100, Hugh Dickins wrote:
> I'm very much with those who suggested an madvise(), for which Chris
> prepared a patch. I know Andrea felt uneasy with an madvise() going
> to a possibly-configured-out-or-never-loaded module, but it is just
> advice, so I don't have a problem with that myself, so long as it
> is documented in the manpage.
I don't have so much of a problem with that, but there are a couple of
differences: normally madvise doesn't depend on the admin to start
some kernel thread to be meaningful, and normally madvise isn't a
privileged operation, see below.
> Whereas I do worry just what capability should be required for this:
> can't a greedy app simply fork itself, touch all its pages, and thus
> lock itself into memory in this way? And I do worry about the cpu
KSM pages are supposed to be swappable in the long run so let's think
longer term.
> cost of all the scanning, if it were to get used more generally -
> it would be a pity if we just advised complainers to tune it out.
Clearly if tons of apps maliciously register themself in ksm, they'll
waste tons of CPU for no good, they'll just populate the unstable tree
with pages that are all equal except for the last 4 bytes slowing down
KSM for nothing. This is also why it's good to have a /dev/ksm ioctl
that the admin can allow only certain users to use for registering
virtual ranges (for example only the kvm/qemu user or all users in
scientific environments). Otherwise we'd need some kind of permissions
settings in sysfs with some API that certainly is less intuitive than
chown/chmod on /dev/ksm. We just can't allow madvise to succeed on any
luser registering itself in KSM, so if it was madvise, it shall return
-EPERM somehow sometime.
> I'm still working my way through ksm.c, and not gone back to look at
> Chris's madvise patch, but doubt it will be sufficient. There's an
> interesting difference between what you're doing in ksm.c, and how
> madvise usually behaves, regarding unmapped areas: madvice doesn't
> usually apply to an unmapped area, and goes away with an area when
> it is unmapped; whereas in KSM's case, the advice applies to whatever
> happens to get mapped in the area specified, persisting across unmaps.
Given the apps using KSM tends to be quite special, the fact it's
sticky, it doesn't go away with munmap isn't big deal, quite to the
contrary those apps will likely have an easier time thanks to the
registration not going away over munmap/mmap, without requiring
reloading of malloc/new calls.
To skip over holes during virtual scans we just vma->vm_next.
> But I do appreciate the separation you've kept so far,
> and wouldn't want to tie it all together too closely.
The above plus the fact it remains self contained without making the
VM any more complicated, gives some value. Even swapping I'd like to
add it without VM specific knowledge about KSM. tmpfs has an easier
time because it has its own vma type, here we've KSM pages mixed
inside regular anonymous !vma->vm_file regions and !vm_ops.
> p.s. I wish you'd chosen different name than KSM - the kernel
> has supported shared memory for many years - and notice ksm.c itself
> says "Memory merging driver". "Merge" would indeed have been a less
> ambiguous term than "Share", but I think too late to change that now
> - except possibly in the MADV_ flag names?
I don't actually care about names, so I leave this to other to discuss.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/