Re: [PATCH] Allow UML kernel to run in a separate host address space

From: Linus Torvalds (torvalds@transmeta.com)
Date: Sat Dec 28 2002 - 15:50:53 EST


On Sat, 28 Dec 2002, Jeff Dike wrote:
>
> > What are the semantics the host code wants/needs,
>
> 1 - Multiple address spaces per process
> 2 - Ability to make a child switch between address spaces
> 3 - Ability to manipulate a child's address space (i.e. mmap, munmap, mprotect
> on an address space which is not current->mm)

Well, #3 falls under "ptrace()" as far as I'm concerned, I don't really
want to expose things through /proc (or /dev, which is even _worse_).

We used to have things that could be done with /proc/<pid>/mem, and it was
a total security disaster. It was removed in the 2.3.x series because of
that.

As to #1, that certainly shouldn't be a problem at all. We already do it
temporarily internally inside the kernel for execve() setup and for things
liek lazy TLB switching for kernel threads, and there's nothing keeping us
from having multiple "struct mm_struct" per process. The only issue is
what the interfaces should be to create one (/dev/mm is right _out_), and
how to switch them around sanely.

Having a

        int fd = create_mm();

system call is certainly not wrong per se (but thinking that it should be
done using a special file is wrong - we don't have /dev/pipe either). And
creating that system call is trivial - but only worth it if there are good
sane interfaces to switch mm's around and do interesting things with them.

Done right, it should be possible to have "posix_spawn()" etc done using
something like that, ie

        /* Create new VM */
        int fd = create_mm();

        /* populate the dang thing.. */
        mmap_mm(fd, .. );

        /* start it up */
        clone_with_mm(fd, ...);

and the internal implementation should be perfectly trivial, since the
kernel already largely works this way internally anyway (yeah, it is
likely to need some re-organization of clone() to handle pre-created VM's
etc, but that's nothing really fundamental).

> Beats me. My first suggestion was to add another file descriptor argument
> to mmap et al which would represent the address space to be modified. Alan
> didn't like that idea too much.

I do believe that fd's are a natural way to handle it, since it needs
_some_ kind of handle, and the only generic handles the kernel has is a
file descriptor. We could create a new kind of handle, but it would be
likely to be just more complexity.

HOWEVER, the part I worry about is creating tons of new system calls that
just duplicate existing ones by adding a "fd" argument. That part I really
don't much like. Because if this were to really be a generic feature, it
really wants pretty much _all_ system calls supported, ie things like

        fd = open(<mm,ptr>, flags, ...);

        retval = read(<mm,ptr>..

to allow the user to not just mmap but generally "take the guise of" any
other mm for the duration of the system call.

Which really means that I _think_ the right approach would be to literally
have a "indirect-system-call-using-this-mm" system call, which does
something like

        asmlinkage sys_mm_indirect(int fd, struct syscall_descriptor_block *user_args)
        {
                struct mm_struct *old_mm;
                struct syscall_descriptor_block args;

                if (memcpy_from_user(&args, user_args, sizeof(args)))
                        return -EFAULT;

                mm = get_fd_mm(fd);
                old_mm = current->mm;
                current->mm = mm;
                switch_mm(mm);

                arch_do_syscall(&args);

                current->mm = old_mm;
                switch_mm(old_mm);
                put_mm(mm);
        }

which allows _any_ system call to be made for that mm.

                Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Dec 31 2002 - 22:00:12 EST