Re: Finally the 2.2.0 is out of sight :-).

Malcolm Beattie (mbeattie@sable.ox.ac.uk)
Wed, 27 Jan 1999 15:13:55 +0000 (GMT)


Michael Elizabeth Chastain writes:
> > Presumably your user-mode kernel would want to run binaries
> > for the same arch it was built for, and have a way to reroute
> > syscalls to the pseudo-kernel rather than the real kernel.
>
> PTRACE_SYSCALL. annul the syscall. do what you want to the target
> process.

For the uvm "port" I'm doing (see my other message in this thread)
I considered that, especially as it doesn't need any changes to the
hosting kernel. However a couple of things made me change my mind:
(1) I'm using multiple real processes to provide the multiple mm
contexts that the uvm kernel needs for its tasks. Having a
ptracing process for each of those (or trying to have one cope
with multiple processes) seemed too ugly.
(2) In order to provide a proper "supervisor-mode" trap and prevent
a user-mode process living in the uvm from accessing its own
kernel's memory, you need to unprotect/protect the uvm kernel's
address space on the way in/out of the uvm kernel.
I consider my solution to be fairly clean and also useful for various
other things such as sandbox supervisors. I wrote such a supervisor
program quite a while ago based on ptrace but I think this way is
cleaner. It adds a couple of flags to task->flags (if that's
possible and there's room) or to task_struct itself: UVM_ENABLE and
UVM_SIGSYS. (I haven't actually done this patch yet: I was going to
wait until the uvm vmlinux actually linked and ran for the first
time :-). An addition to system_call in entry.S (or equivalent for
other architectures) tests flags(%ebx) (same as PF_TRACESYS). If
UVM_SIGSYS is not set, do the system call as normal. If UVM_SIGSYS
is set then do the kernel equivalent of
mprotect(UVM_PAGE_OFFSET, PAGE_OFFSET - UVM_PAGE_OFFSET,
PROT_READ|PROT_WRITE|PROT_EXEC);
followed by
send_sig(SIGSYS, current, 1); /* let sig handler deal with it */

(If desired, UVM_PAGE_OFFSET could be variable held in a field of
task_struct too.) An addition to send_sig would do
current->flags &= ~UVM_SIGSYS; /* re-allow ordinary syscalls */
An addition to sigreturn() would do
current->flags |= UVM_SIGSYS;
so that, on return back to uvm "user-mode", system calls would trap
once again.

> > Worse yet, I see no obvious way for this 'user-mode' kernel to do
> > the necessary memory remapping it would need for its processes.
> > Being run inside of the same real process, they would have the
> > same view on memory.

You can use mmap() to map memory pretty much however you like. It
won't be fast though, having a vma for each mapped 4K page. On the
other hand, at least the AVL tree mm went back into 2.2 so that
masses of vmas are handled at least reasonably.

> I have no clue how to handle things like device drivers, but I have
> some of the puzzle pieces for the syscall filtering layer for a UMK.

My other message sketches what device drivers I'm doing. Since you
can use libc and ordinary system calls in uvm kernel mode, it's a lot
easier moving blocks and data around between that and other "real"
processes to mimic devices than bashing on real hardware from a
real kernel.

--Malcolm

-- 
Malcolm Beattie <mbeattie@sable.ox.ac.uk>
Unix Systems Programmer
Oxford University Computing Services

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/