Re: [PATCH] - support inheritance of mlocks across fork/exec V2

From: Matt Mackall
Date: Tue Dec 09 2008 - 15:42:43 EST


On Tue, 2008-12-09 at 14:40 -0500, Lee Schermerhorn wrote:
> On Mon, 2008-12-08 at 15:33 -0600, Matt Mackall wrote:
> > On Mon, 2008-12-08 at 16:05 -0500, Lee Schermerhorn wrote:
> > > > > In support of a "lock prefix command"--e.g., mlock <cmd>
> > > <args> ...
> > > > > Analogous to taskset(1) for cpu affinity or numactl(8) for numa memory
> > > > > policy.
> > > > >
> > > > > Together with patches to keep mlocked pages off the LRU, this will
> > > > > allow users/admins to lock down applications without modifying them,
> > > > > if their RLIMIT_MEMLOCK is sufficiently large, keeping their pages
> > > > > off the LRU and out of consideration for reclaim.
> > > > >
> > > > > Potentially useful, as well, in real-time environments to force
> > > > > prefaulting and residency for applications that don't mlock themselves.
> >
> > This is a bit scary to me. Privilege and mode inheritance across
> > processes is the root of many nasty surprises, security and otherwise.
>
> Hi, Matt:
>
> Could you explain more about this issue? I believe that the patch
> doesn't provide any privileges or capabilities that a process doesn't
> already have. It just allows one to cause a process and, optionally,
> its descendants, to behave as if one had modified the sources to invoke
> mlockall() directly. It is still subject to each individual task's
> resource limits. At least, that was my intent.

Again, it's about inheriting *something* across processes. This
historically creates surprises. When the thing being inherited is a
privilege, the surprises are security holes. I can't tell you what the
surprises are, only that I expect there to be some.

mlockall is not a privilege in itself, but it is a non-standard mode of
operation that most processes are not designed for or expecting, and it
does relate to the handling of a finite resource with system stability
implications. If the thing that turns on this mode is buried inside some
poorly written app that decides it needs to mlockall somewhere, the
'surprise' can occur far away - in the child of a child of a thread an
hour later. And the surprise can be fatal - all memory eaten up, because
our process just happened to run without an rlimit.

I've seen this with RT. An app temporarily elevates itself to RT, and
unrelatedly forks another process. The second short-lived process kicks
off a daemon that sometime later consumes all CPU in a busy loop (in
this case waiting on I/O that never happens because everything else is
shut out).

Doing it as a container parameter means that you explicitly recognize
that 'everything in the container' gets this mode. And you've probably
also given a thought to 'how big is this container' and the like as
well.

> As far as "what I'm trying to do": I see this as exactly like using
> taskset to set the cpu affinity of a task, or numactl to set the task
> mempolicy without modifying the source.

Oh sure, I completely get that and I think it's a useful notion. And I
think the above analogous interfaces have more or less the same issues,
except that mlockall is a much older and more widely used API.
Containers are a better match here too, but the above predate
containers.

> If one had access to the
> program source and wouldn't, e.g., void a support contract by modifying
> it, one could just insert the calls into the source itself.

Huh? We would of course set up a container 'from the outside'. By
comparison, your mlockall() call traditionally operates from 'from the
inside', and you're proposing to add a flag and a helper program that
makes it work 'from the outside' too. Which is a bit hackish.

--
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/