Re: Differences between FreeBSD and Linux system call mechanism

Richard B. Johnson (root@chaos.analogic.com)
Thu, 3 Sep 1998 18:08:06 -0400 (EDT)


On 3 Sep 1998, H. Peter Anvin wrote:

> Followup to: <Pine.LNX.3.95.980903141003.347A-100000@chaos.analogic.com>
> By author: "Richard B. Johnson" <root@chaos.analogic.com>
> In newsgroup: linux.dev.kernel
> > >
> > [SNIPPED]
> >
> > When you are in user-mode in an ix86, there is no such thing as a
> > "jump" to the kernel. You either "trap" to the kernel, you execute a
> > call through a call-gate, or you load a TSS to execute a task-switch to
> > kernel-mode code.
> >
>
> The SYSENTER instruction is pretty much a jump to the kernel.
>
> > In any event, your task ceases to exist when the kernel executes. If it
> > were not for "features" provided by the kernel such as items that persist
> > (files, pipes, shared memory, etc), isolation between the kernel and user-
> > mode tasks is perfect because they don't exist at the same time.
>
> It doesn't "cease to exist". Switching to kernel space does NOT cause
> a task switch. Rather, we say that task XXXX is executing in kernel
> mode. This may sound like a nitpick, but is a very important
> difference when you do kernel-mode programming.
>

This is pretty much a nit-pick of semantics. The user's code becomes
just a bunch of data. Since kernel mode can, in principle access anything
in the machine, certainly it can do anything. However, it does not
in general, execute the user's code.

> > The kernel executes functions on behalf of user-mode tasks. How you get
> > this to happen is dependent upon the system architecture. Linus chose
> > to execute interrupt 80 hex (which was used by BASIC in MS-DOS).
> > Since an interrupt is a privileged instruction, when it occurs in
> > user-mode code, a trap occurs. Kernel software, pointed to by the
> > IDT handling 80 hex, analyzes the user-mode requests and performs
> > them on behalf of the user-code code. The nature of the interrupt
> > handling code allows the interrupted task to be restarted since
> > the context of the interrupted task is saved.
>
> INT is *not* a privileged instruction. If it was, it would cause a
> General Protection Fault (#GP). It doesn't; if the interrupt gate has
> the appropriate ring brackets (which, of course, the INT 0x80 gate
> does), it will allow the gate to be crossed.

My intel book has 4 pages saying it is. Further, if the kernel code
has not provided an appropriate gate, it will generate a general
protection fault. Protected-mode exceptions are:

#GP, #NP, #SS, and #TS. As stated, since there is a gate for INT 0x80,
appropriate kernel-mode code will be executed as a result.

> > You need not issue an interrupt to get the attention of the kernel.
> > User-mode code could execute any privileged instruction. For instance,
> > the runtime library could execute the CLI instruction. This would
> > trap to the #GP(0) (General Protection) handler in the kernel at which
> > time the kernel could analyze the user-mode request. This is certainly
> > not what #GP(0) was provided for, but it could work.
>
> CLI would suck since if the process have IOPL=3 (available via system
> call for root processes) CLI is no longer a privileged operation.
> However, this is pretty much how DOSEMU works: by trapping privileged
> instructions and emulating them. The x86 provides a special "V86"
> mode which makes emulating an 8086 easier.
>

True, but otherwise privileged instructions can of course be emulated
or even ignored by the kernel. This happens because they are privileged
so they generate a kernel-mode trap where the kernel can decide what
to do about them. The fact that the kernel code chose to ignore an
otherwise privileged instruction, does not mean that it was not
privileged.

> > Using a TSS from user-mode, i.e., LTR will not directly load the
> > TSS because LTR is also a privileged instruction. It will also
> > trap at a #GP(0) unless the TSS is not present. In that case a
> > page-fault trap occurs.
>
> LTR is a privileged instruction and as such will #GP(0) regardless of
> whether or not *anything* is present in memory. Incidentally, LTR
> never actually touches the pointed-at TSS so even if it is not present
> will not #PF.
>

Page 26-209 of the Intel486 Programmer's Reference Manual says
that it will page-fault.

> > In any event, such a TSS should never be accessible from user-mode
> > or else one could deliberately damage the OS. It would have to be
> > faulted-in so the kernel could supply the real one, not one trashed
> > from user-mode code.
>
> A TSS, or a descriptor table, writable from user space would be like a
> kernel writable from user space: this OS is not and cannot be secure.
>

Correct. Definitely not the way to make an OS.

> > These extra "indirection" steps may actually slow the request for
> > kernel services although this is not necessarily so because, for
> > instance, a page-fault has been optimized to be very fast since
> > it commonly occurs.
>
> It WILL slow the request down (a system call is *much* slower than an
> intraprivilege function call), but there is no alternative.
>
> > Any time saved in executing kernel function calls by executing
> > code in "strange" ways may be lost as soon as a kernel function
> > needs to access something provided by user-mode code such as
> > a pointer to a buffer or even a register value itself. It's
> > the total time that counts, not the time to make the transition
> > from user-mode to the kernel.
>
> This isn't true; the kernel is perfectly able to access anything from
> the process as in user space; it is still the same process, just
> running in kernel mode.
>

It isn't running user code!! It's code selector references the kernel
code. It's data selector references kernel data, the stack and SS is the
kernel stack.... How could you figure that it's the same process? If you
are referring to "current" and other kernel variables that reference the
task that was last switched to by the kernel, this is a totally
artificial, (but certainly useful), mechanism used by Unix and Unix-like
Operating Systems.

> > There is work being done on a new "SysCall" method. Time will
> > tell if it is really faster. My guess is that it will be 6 of
> > one-kind and 1/2 dozen of another. Personally I prefer a
> > software interrupt because it seems as though this is what
> > the ix86 CPUs were designed to use.
>
> SYSENTER is *much* faster, mainly because it doesn't try to muck with
> segments. Design decisions that made sense on the i286 (which is
> from when most of the segment crap came) no longer make sense.

Cheers,
Dick Johnson
***** FILE SYSTEM WAS MODIFIED *****
Penguin : Linux version 2.1.118 on an i586 machine (66.15 BogoMips).
Warning : It's hard to remain at the trailing edge of technology.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html