On Mon, Oct 06, 2014 at 05:33:18PM -0700, David Daney wrote:[...]
Not at all, I was thinking of soft-float ABIs, as they never execute FP instructions, and are often used on systems with no FPU. In fact many non-FPU systems never execute any hard-float code. So those system should not suffer large performance regressions from any change made to support a non-executable stack.Why not? It will emit any instructions we care to make it emit. IfMy experience has been that hardware and software developers focused
we want it to emit crypto instructions with patented algorithms,
then it will do that. But we would still like to use a generic
kernel with generic FPU support.
The most straight forward way (and the currently implemented way) of
doing this is to execute the instructions in question out-of-line
(on the userspace stack).
The question here is: What is the best way to get to a
non-executable stack.
The consensus among MIPS developers is that we should continue using
on a particular hardware target are generally unqualified to make
decisions that affect the design and operation of libc or the kernel.
They are not experts in these areas. It was apparent early on in this
thread, when you mentioned the idea that "not all threads would need
fpu support", that you were thinking from a standpoint of custom
low-level software and not a general purpose libc that cannot read the
application author's mind.
It seems nobody had thought of the
impossibility of doing lazy setup (inability to handle failure) and
the necessity of always initializing this stuff at pthread_create
time, either. Design issues like this should be run by experts in the
libc area early on, not as an afterthought.
It would be nice to support, but not doing so would not be a regression from current behavior.
the out-of-line execution trick, but do it somewhere other than inHow do you answer Andy Lutomirski's question about what happens when a
stack memory.
signal handler interrupts execution while the program counter is
pointing at this "out-of-line execution" trampoline? This seems like a
show-stopper for using anything other than the stack.
One way of doing this is to have the kernel magically generateHaving the kernel magically do it would be better, but I'm doubtful
thread local memory regions.
Another option is to have userspace manage the out-of-line execution areas.
As is often the case, each approach has different pluses and minuses.
that solution works anyway due to the above signal handler/nesting
issue.
Rich