ok. I suppose it can be done as an option. If required, access from host system can be allowed. If "secure" environment is requested - fully isolated.I hope you understand, that such things do not make anythingOnly if they have CAP_SYS_RAWIO. I admit it takes a lot more
secure. Administrator of the node will always have access to /proc/kcore,
devices, KERNEL CODE(!) etc. No security from this point of view.
to get there than just that. But having a mechanism that has the
potential to be secured and is much simpler to understand
and to setup for minimal privileges than any of the other unix
addons I have seen is very interesting.
Yes, but nesting can be one level as in OpenVZ, when VPS is a nested namespace inside host system or it can be a fully isolated separate traditional namespace.For everything except the PID namespace I am just interested in having multiple3) Nesting of containers, (so they are general purpose and not special hacks).
Why are you interested in nesting? Any applications for this?
Until everything is virtualized in nesting way (including TCP/IP stack, routing
etc.) I see no much use of it.
separate namespaces. For the PID namespace to keep the traditional unix
model you need a parent process so it is actually nesting.
I am interested because, it is easy, because if it is possible thanYou can use fully isolated containers like OpenVZ VPSs for this. They are naturally suitable for this, because provide you not PIDs isolation only, but also IPC, sockets, etc.
the range of applications you can apply a containers to is much
larger. At the far end of that spectrum is migrating a server running
on real hardware and bringing it up as a guest on a newer much more
powerful machine. With the appearance that it had only been
unreachable for a few seconds.
The vserver way of solving some of these problems is to provide a way
to enter the guest. I would rather have some explicit operation that puts
you into the guest context so there is a single point where we can tackle
the nested security issues, than to have hundreds of places we have to
look at individually.
Huh, it sounds too easy. Just imagine that VPS owner has deleted ps, top, kill,
bash and other tools. You won't be able to enter.
Entering is different from execing a process on the inside.If I understand you correctly it is fully insecure way of doing things. After changing context without applying all the restrictions which should be implied by VPS your process will be ptrace'able and so on.
Implementation wise it is changing the context pointer on your task.
Debugging - yes, in production - no.Another example when VPS ownerFor debugging this is a good reason for being inside. What if the
is near its resource limits - you won't be able to do anything after VPS
entering.
problem is that you are out of resources?
I have no intention of requiring monitoring to work from the inside though.Thanks a lot for valuable discussion and your time!Do you need other examples?No I need to post patches.