On Mon, Oct 06, 2014 at 01:23:30PM -0700, David Daney wrote:
From: David Daney <david.daney@xxxxxxxxxx>
In order for MIPS to be able to support a non-executable stack, we
need to supply a method to specify a userspace area that can be used
for executing emulated branch delay slot instructions.
We add a new system call, sys_set_fpuemul_xol_area so that userspace
threads that are using the FPU can specify the location of the FPU
emulation out of line execution area.
Background:
MIPS floating point support requires that any instruction that cannot
be directly executed by the FPU, be emulated by the kernel. Part of
this emulation involves executing non-FPU instructions that fall in
the delay slots of FP branch instructions. Since the beginning of
MIPS/Linux time, this has been done by placing the instructions on the
userspace thread stack, and executing them there, as the instructions
must be executed in the MM context of the thread receiving the
emulation.
Because of this, the de facto MIPS Linux userspace ABI requires that
the userspace thread have an executable stack. It is de facto,
because it is not written anywhere that this must be the case, but it
is never the less a requirement.
Problem:
How do we get MIPS Linux to use a non-executable stack in the face of
the FPU emulation problem?
Since userspace desires to change the ABI, put some of the onus on the
userspace code. Any userspace thread desiring a non-executable stack,
must allocate a 4-byte aligned area at least 8 bytes long with that
has read/write/execute permissions and pass the address of that area
to the kernel with the new sys_set_fpuemul_xol_area system call.
This is similar to how we require userspace to notify the kernel of
the value of the thread local pointer.
Userspace should play no part in this; requiring userspace to help
make special accomodations for fpu emulation largely defeats the
purpose of fpu emulation.
The kernel is perfectly capable of mapping
an appropriate page. The mapping should happen at exec time, and at
clone time with CLONE_VM
unless the kernel is going to handle mutual
exclusion so that only one thread can be using the page at a time.
(Using one page for the whole process, and excluding simultaneous
execution of fpu emulation in multiple threads, may be the more
practical approach.)
As an alternative, if the space of possible instruction with a delay
slot is sufficiently small, all such instructions could be mapped as
immutable code in a shared mapping, each at a fixed offset in the
mapping. I suspect this would be borderline-impractical (multiple
megabytes?), but it is the cleanest solution otherwise.
Rich