Re: RFC [patch 13/34] PID Virtualization Define new task_pid api

From: Kirill Korotaev
Date: Fri Feb 03 2006 - 05:49:56 EST

Next message: Alexey Kuznetsov: "Re: [RFC][PATCH 5/7] VPIDs: vpid/pid conversion in VPID enabled case"
Previous message: Pavel Machek: "Re: [ 00/10] [Suspend2] Modules support."
In reply to: Kirill Korotaev: "Re: RFC [patch 13/34] PID Virtualization Define new task_pid api"
Next in thread: Eric W. Biederman: "Re: RFC [patch 13/34] PID Virtualization Define new task_pid api"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I hope you understand, that such things do not make anything
secure. Administrator of the node will always have access to /proc/kcore,
devices, KERNEL CODE(!) etc. No security from this point of view.

Only if they have CAP_SYS_RAWIO. I admit it takes a lot more
to get there than just that. But having a mechanism that has the
potential to be secured and is much simpler to understand
and to setup for minimal privileges than any of the other unix
addons I have seen is very interesting.

ok. I suppose it can be done as an option. If required, access from host system can be allowed. If "secure" environment is requested - fully isolated.

3) Nesting of containers, (so they are general purpose and not special hacks).

Why are you interested in nesting? Any applications for this?
Until everything is virtualized in nesting way (including TCP/IP stack, routing
etc.) I see no much use of it.

For everything except the PID namespace I am just interested in having multiple
separate namespaces. For the PID namespace to keep the traditional unix
model you need a parent process so it is actually nesting.

Yes, but nesting can be one level as in OpenVZ, when VPS is a nested namespace inside host system or it can be a fully isolated separate traditional namespace.

By real nesting I mean hierarchical containers, when containers inside multiple containers are allowed. This is hard to implement. For example, for real containers you will need to have isolated TCP/IP stacks and with complex rules of routing etc.

I am interested because, it is easy, because if it is possible than
the range of applications you can apply a containers to is much
larger. At the far end of that spectrum is migrating a server running
on real hardware and bringing it up as a guest on a newer much more
powerful machine. With the appearance that it had only been
unreachable for a few seconds.

You can use fully isolated containers like OpenVZ VPSs for this. They are naturally suitable for this, because provide you not PIDs isolation only, but also IPC, sockets, etc.

How can you migrate application which consists of two processes doing IPC via signals? They are not tired inside kernel anyhow and there is no way to automatically detect that both should be migrated together.
VPSs what provides you such kind of boundaries of what should be considered as a whole.

The vserver way of solving some of these problems is to provide a way
to enter the guest. I would rather have some explicit operation that puts
you into the guest context so there is a single point where we can tackle
the nested security issues, than to have hundreds of places we have to
look at individually.

Huh, it sounds too easy. Just imagine that VPS owner has deleted ps, top, kill,
bash and other tools. You won't be able to enter.

Entering is different from execing a process on the inside.
Implementation wise it is changing the context pointer on your task.

If I understand you correctly it is fully insecure way of doing things. After changing context without applying all the restrictions which should be implied by VPS your process will be ptrace'able and so on.

Another example when VPS owner
is near its resource limits - you won't be able to do anything after VPS
entering.

For debugging this is a good reason for being inside. What if the
problem is that you are out of resources?

Debugging - yes, in production - no.
That is why OpenVZ allows host system to access VPS resources - for debugging in production.

I have no intention of requiring monitoring to work from the inside though.
Do you need other examples?
No I need to post patches.

Thanks a lot for valuable discussion and your time!

Kirill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Alexey Kuznetsov: "Re: [RFC][PATCH 5/7] VPIDs: vpid/pid conversion in VPID enabled case"
Previous message: Pavel Machek: "Re: [ 00/10] [Suspend2] Modules support."
In reply to: Kirill Korotaev: "Re: RFC [patch 13/34] PID Virtualization Define new task_pid api"
Next in thread: Eric W. Biederman: "Re: RFC [patch 13/34] PID Virtualization Define new task_pid api"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]