Re: [PATCH 4/4] add ksm kernel shared memory driver.

From: Anthony Liguori
Date: Tue Mar 31 2009 - 12:51:35 EST


Andrea Arcangeli wrote:
On Tue, Mar 31, 2009 at 10:54:57AM -0500, Anthony Liguori wrote:
You can still disable ksm and simply return ENOSYS for the MADV_ flag. You

-EINVAL if something, -ENOSYS would tell userland that it shall stop
trying to use madvise, including the other MADV_ too.

could even keep it as a module if you liked by separating the madvise bits from the ksm bits. The madvise() bits could just provide the tracking infrastructure for determine which vmas were currently marked as sharable.
You could then have ksm as loadable module that consumed that interface to then perform scanning.

What's the point of making ksm a module if one has part of ksm code
loaded in the kernel and not being possible to avoid compiling in?
People that says KSM=N in their .config (like embedded running with 1M
of ram), don't want that tracking overhead compiled into the kernel.

You have two things here. CONFIG_MEM_SHARABLE and CONFIG_KSM. CONFIG_MEM_SHARABLE cannot be a module. If it's set to =n, then madvise(MADV_SHARABLE) == -ENOSYS.

If CONFIG_MEM_SHARABLE=y, then madvise(MADV_SHARABLE) will keep track of all sharable memory regions. Independently of that, CONFIG_KSM can be set to n,m,y. It depends on CONFIG_MEM_SHARABLE and when it's loaded, it consumes the list of sharable vmas.

But honestly, CONFIG_MEM_SHARABLE shouldn't a lot of code so I don't see why you'd even need to make it configable.

A number of MADV_ flags are Linux specific (like MADV_DOFORK/MADV_DONTFORK).

But those aren't kernel module related, so they're in line with the
standard ones and could be adapted by other OS.

KSM is not a core VM functionality, madvise is a core VM
functionality, so I don't see fit. KSM as ioctl or KSM creating
/proc/<pid>/ksm when loaded, sounds fine to me instead. If open of
either one fails, application won't register in. It's up to you to
choose KSM=M/N, if you want it as core functionality just build as
KSM=Y but leave the option to others to save memory.

The ioctl() interface is quite bad for what you're doing. You're telling the kernel extra information about a VA range in userspace. That's what madvise is for. You're tweaking simple read/write values of kernel infrastructure. That's what sysfs is for.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/