Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure

From: Herbert Poetzl
Date: Fri Mar 24 2006 - 15:59:02 EST


On Fri, Mar 24, 2006 at 01:28:49PM -0700, Eric W. Biederman wrote:
> Kirill Korotaev <dev@xxxxx> writes:
>
>
> >> I certainly have not. I do feel that developing this just from the
> >> top down is the wrong way to do this. In some of the preliminary
> >> patches we have found several pieces of code that we will have to
> >> touch that is currently in need of a cleanup. That is why I have
> >> been cleaning up /proc. sysctl is in need of similar treatment
> >> but is in less bad shape.
> > Eric, though I suggest to postpone proc and sysctl a bit, can you share
> > me your vision of /proc and /sysctl virtualization a bit?
> > A good way to handle them IMHO is to make fully virtual, i.e. each
> > namespace should have an own set of sysctl or proc tree.
>
> Roughly I agree. Some cases are easier than others. So let me take
> just the sysvipc case as an example.
>
> My thinking is move the calls for printing the sysvipc namespace
> from fs/proc/generic.c (with all of it's cool helpers) to
> fs/proc/base.c.
>
> So we wind up with:
> /proc/<pid>/sysvipc/msg
> /proc/<pid>/sysvipc/sem
> /proc/<pid>/sysvipc/shm
> /proc/sysvipc -> /proc/self/sysvipc
>
> For sysctl we add a method to fetch the address of
> the variable and perhaps a few other attributes,
> that method is passed a task structure.
>
> Then we can have per process instances of:
> /proc/<pid>/sys/sem
> /proc/<pid>/sys/shmall
> /proc/<pid>/sys/shmmax
> /proc/<pid>/sys/msgmax
> /proc/<pid>/sys/msgmni
> /proc/<pid>/sys/shmmni
> And a symlink at:
> /proc/sys that points to /proc/<pid>/sys
>
> Getting sysvipc to show up in a per process fashion is pretty
> easy. Getting the entire sys hierarchy to show up per process
> is a little harder simply because I think to do it cleanly requires
> help functions that I don't have yet. I have removed all of
> the internal dependence on magic inode numbers completely removing
> the hard coded inode numbers and putting sys looks doable.
>
> Does that sound like a reasonable model?

hmm, isn't per process a little extreme ... I know
what you want to accomplish but won't this lead to
a per process procfs? and, if you want to do per
process procfs, what would be the gain?

just my opinion ...

best,
Herbert

> >> Part of it is that I have stopped to look more closely at what
> >> other people are doing and to look at alternative implementations.
> > If you need any help with it in OpenVZ, feel free to ask. We have
> > broken-out patches for recent 2.6.16 kernel.
>
>
> >> One interesting thing I have manged to do is by using ptrace I have
> >> implemented enter for the existing filesystem namespaces without
> >> having to modify the kernel. This at least says that enter and
> >> debugging are two faces of the same coin.
> > Hmmm, strange claim/conclusion... /dev/kmem allows to change namespaces
> > also :) and even to obtain root priviliges if needed... :)
>
> True. However this is much less ugly then using /dev/kmem, and it is
> much closer to what applications like user mode linux do. The primary
> question in my mind was what should the permissions checks be when
> performing this kind of action. Using ptrace satisfied that.
>
> So I now have a bounding box for what enter should be able to do
> and what permissions it should take.
>
> > Eric, let's not compare approaches with inches :)
> > As you remember your PID namespaces doesn't suite us well... :(
>
> More discussion when the time is right. But I believe I have solved
> the fundamental incompatibility that we had. I asked you a question to
> confirm that a while ago, but I have not heard anything back.
>
> Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/