Re: [PATCH] mmap: add sysctl for controlling ~VM_MAYEXEC taint

From: Andrew Morton
Date: Tue Aug 16 2011 - 17:55:53 EST


On Mon, 15 Aug 2011 15:57:35 -0500
Will Drewry <wad@xxxxxxxxxxxx> wrote:

> This patch proposes a sysctl knob that allows a privileged user to
> disable ~VM_MAYEXEC tainting when mapping in a vma from a MNT_NOEXEC
> mountpoint. It does not alter the normal behavior resulting from
> attempting to directly mmap(PROT_EXEC) a vma (-EPERM) nor the behavior
> of any other subsystems checking MNT_NOEXEC.
>
> It is motivated by a common /dev/shm, /tmp usecase. There are few
> facilities for creating a shared memory segment that can be remapped in
> the same process address space with different permissions. Often, a
> file in /tmp provides this functionality. However, on distributions
> that are more restrictive/paranoid, world-writeable directories are
> often mounted "noexec". The only workaround to support software that
> needs this behavior is to either not use that software or remount /tmp
> exec.

Remounting /tmp would appear to have the same effect as altering this
sysctl, so why not just remount /tmp?

> (E.g., https://bugs.gentoo.org/350336?id=350336) Given that
> the only recourse is using SysV IPC, the application programmer loses
> many of the useful ABI features that they get using a mmap'd file (and
> as such are often hesitant to explore that more painful path).
>
> With this patch, it would be possible to change the sysctl variable
> such that mprotect(PROT_EXEC) would succeed. In cases like the example
> above, an additional userspace mmap-wrapper would be needed, but in
> other cases, like how code.google.com/p/nativeclient mmap()s then
> mprotect()s, the behavior would be unaffected.
>
> The tradeoff is a loss of defense in depth, but it seems reasonable when
> the alternative is to disable the defense entirely.
>
> ...
>
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -89,6 +89,9 @@
> /* External variables not in a header file. */
> extern int sysctl_overcommit_memory;
> extern int sysctl_overcommit_ratio;
> +#ifdef CONFIG_MMU

The ifdef isn't needed in the header and we generally omit it to avoid
clutter.

afaict this feature could be made available on NOMMU systems?

> +extern int sysctl_mmap_noexec_taint;

The term "taint" has a specific meaning in the kernel (see
add_taint()). It's regrettable that this patch attaches a second
meaning to that term. Can we think of a better word to use?

A better word would communicate the sense of the sysctl operation. If
a "taint" flag is set to true, I don't know whether that means that
noexec is enabled or disabled. Something like
sysctl_mmap_noexec_override or sysctl_mmap_noexec_disable, perhaps.

This patch forgot to document the new feature and its sysctl.
Documentation/sysctl/vm.txt might be the right place.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/