Re: 2.3.30pre1 syscall w/6 args support?

Linus Torvalds (torvalds@transmeta.com)
Tue, 7 Dec 1999 21:49:59 -0800 (PST)


On Wed, 8 Dec 1999, Artur Skawina wrote:
>
> well, while trying to get it to work i've often wondered what the
> person who came up with the sysenter/sysexit scheme at intel had been
> sniffing...

I agree. I don't understand why they couldn't just have saved the old
eip/esp in another msr or something.

> the choice becomes: do you want the libc to do this (presumably just one
> test on bootup which selects the proper stubs to use) or do you want the
> kernel to do very much the same (by providing the stubs (or as you
> suggested some kind of intermidiate single C entry point, which isn't a
> good solution either).

It wouldn't be a C entry-point: it would really be an assembly
entry-point, and you'd have special calling conventions for it (for one
thing, you'd have to set up %eax to be the system call number).

As far as the kernel would be concerned, it would be just another of the
"high virtual memory mappings" - the kernel already uses them internally
simply because it is a good idea to avoid a pointer lookup for some things
that are so common that you can more efficiently cache them in the TLB. So
the kernel as certain magical addresses where the IO-APIC is always at one
fixed virtual address instead of having to be looked up on every access to
it.

So this would be just another such "virtually pinned" page, except it
would be readable from user space too (but obviously not writable). It can
contain any number of trampolines and/or other data (although I don't
really see what static data would ever be that timing-critical).

> I did consider this. The reasons i didn't do it that way were (1) to
> make it as nonitrusive as possible (the patch currently is just a few
> dozen lines) and (2) policy -- adding hardwired userspace mappings.
> (the performance difference would be ~ a few cycles, so that's not a
> big issue). Now, if (1) isn't a problem i may consider that scheme again.
> Assuming nobody sees any problems w/ (2) above?

Note that we definitely don't want to eat up virtual space in the
"traditional" user space area - 0x00000000 - 0xC0000000. The pinned down
thing would be a kernel mapping, basically a vmalloc()-like thing except
normal vmallocs are not accessible from user space (the page tables have
been set up to not allow access to them for obvious reasons).

(And no, I'm not really suggesting you use vmalloc() - I'm really
suggesting you look at the stuff in <asm/fixmap.h> which gets set up at
predefined addresses rather than the more dynamic vmalloc()).

> Yep. In theory this seems perfect. Other people have already pointed out some
> obvious issues, there are probably more; most are likely solvable, but could
> add significantly to the complexity. [there are some less obvious issues like
> you can not, for various reasons, blindly use the sysenter/sysexit pair
> (eg you can not return with sysexit from execve) and in some cases i found
> int80 to be faster (eg signal handling, when restarting syscalls)]

sysenter is really a strange thing. Have you verified that your current
code works with vm86 mode programs like dosemu or with wine, for example?

I can see why intel wanted to implement sysenter/sysexit, but it IS a
rather strange way of doing what they did. Not very flexible.

> Not that i think that this is relevant in this case -- the question is:
> is there anything that the kernel can do that could not be done equally
> well in libc? Why shouldn't libc be setting up that special magic entry
> point all by itself?

Because libc will do it every single time a process gets started.

I'm a latency person. I've tried to talk to people about pre-linking libc
at a fixed address, and avoid all the horrible run-time linking for the
case when the pre-linked address is available. So far nobody has done
this on Linux, and it makes our process startup slower.

I don't want to make it worse.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/