Re: How much of a mess does OpenVZ make? ;) Was: What can OpenVZdo?

From: Alexey Dobriyan
Date: Fri Mar 13 2009 - 15:28:16 EST


On Fri, Mar 13, 2009 at 10:27:54AM -0700, Linus Torvalds wrote:
> On Thu, 12 Mar 2009, Sukadev Bhattiprolu wrote:
>
> > Ying Han [yinghan@xxxxxxxxxx] wrote:
> > | Hi Serge:
> > | I made a patch based on Oren's tree recently which implement a new
> > | syscall clone_with_pid. I tested with checkpoint/restart process tree
> > | and it works as expected.
> >
> > Yes, I think we had a version of clone() with pid a while ago.
>
> Are people _at_all_ thinking about security?
>
> Obviously not.

For the record, OpenVZ always have CAP_SYS_ADMIN check on restore.
And CAP_SYS_ADMIN will be in version to be sent out.

Not having it is one big security hole.

> There's no way we can do anything like this. Sure, it's trivial to do
> inside the kernel. But it also sounds like a _wonderful_ attack vector
> against badly written user-land software that sends signals and has small
> races.
>
> Quite frankly, from having followed the discussion(s) over the last few
> weeks about checkpoint/restart in various forms, my reaction to just about
> _all_ of this is that people pushing this are pretty damn borderline.
>
> I think you guys are working on all the wrong problems.
>
> Let's face it, we're not going to _ever_ checkpoint any kind of general
> case process. Just TCP makes that fundamentally impossible in the general
> case, and there are lots and lots of other cases too (just something as
> totally _trivial_ as all the files in the filesystem that don't get rolled
> back).

What do you mean here? Unlinked files?

> So unless people start realizing that
> (a) processes that want to be checkpointed had better be ready and aware
> of it, and help out

This is not going to happen. Userspace authors won't do anything
(nor they shouldn't).

> (b) there's no way in hell that we're going to add these kinds of
> interfaces that have dubious upsides (just teach the damn program
> you're checkpointing that pids will change, and admit to everybody
> that people who want to be checkpointed need to do work) and are
> potential security holes.

I personally don't understand why on earth clone_with_pid() is again
with us.

As if pids are somehow unique among other resources.

It was discussed when IPC objects creation with specific parameters were
discussed.

"struct pid" and "struct pid_namespace" can be trivially restored
without leaking to userspace.

People probably assume that task should be restored with clone(2) which
is unnatural given relations between task_struct, nsproxy and individual
struct foo_namespace's

> (c) if you are going to play any deeper games, you need to have
> privileges. IOW, "clone_with_pid()" is ok for _root_, but not for
> some random user. And you'd better keep that in mind EVERY SINGLE
> STEP OF THE WAY.
>
> I'm really fed up with these discussions. I have seen almost _zero_
> critical thinking at all. Probably because anybody who is in the least
> doubtful about it simply has tuned out the discussion. So here's my input:
> start small, start over, and start thinking about other issues than just
> checkpointing.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/