On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
Right now, Linux isn't all that friendly to JIT emulators.
Here are the problems and suggestions to improve the situation.
There is an SE Linux execmem restriction that enforces W^X.
Assuming you don't wish to just disable SE Linux, there are
two ugly ways around the problem. You can mmap a file twice,
or you can abuse SysV shared memory. The mmap method requires
that you know of a filesystem mounted rw,exec where you can
write a very large temporary file. This arbitrary filesystem,
rather than swap space, will be the backing store. The SysV
shared memory method requires an undocumented flag and is
subject to some annoying size limits. Both methods create
objects that will fail to be deleted if the program dies
before marking the objects for deletion.
If the policy forbidding self-modifying code lacks a method of
exempting programs such as JIT interpreters (which I doubt) then
it's a problem. I'm with Alan on this one.
On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:Processors often have annoying limits on the immediate values
in instructions. An x86 or x86_64 JIT can go a bit faster if
all allocations are kept to the low 2 GB of address space.
There are also reasons for a 32bit-to-x86_64 JIT to chose
a nearly arbitrary 2 GB region that lies above 4 GB.
Other archs have other limits, such as 32 MB or 256 MB.
This sort of logic might be appropriate for a sort of parametrized
and specialized vma allocator setting the policy in /proc/ along
with various sorts of limits. There are limits to such and at some
point things will have to manually manage their own process address
spaces in a platform-specific fashion. If kernel assistance here is
rejected they may have to do so in all cases.
On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:Additions to better support JIT emulators:
a. sysctl to set IPC_RMID by default
This is a bad idea. The standard semantics are needed for programs
relying upon them.
On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:c. open() flag to unlink a file before returning the fd
You probably want a tmpfile(3) -like affair which never has a pathname
to begin with. It could be useful for security purposes more generally.
On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:d. mremap() flag to always keep the old mapping
This sounds vaguely like another syscall, like mdup(). This is
particularly meaningful in the context of anonymous memory, for
which there is no method of replicating mappings within a single
process address space.
On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:e. mremap() flag to get a read/write mapping of a read/exec one
f. mremap() flag to get a read/exec mapping of a read/write one
Presumably to be used in conjunction with keeping the old mapping.
A composite mdup()/mremap() and mprotect(), presumably saving a TLB
flush or other sorts of overhead, may make some sort of sense here.
Odds are it'll get rejected as the sequence of syscalls is a rather
precise equivalent, though it would optimize things (as would other
composite syscalls, e.g. ones combining fork() and execve() etc.).
On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:g. mremap() flag to make the 5th arg (new addr) be the upper limit
h. 6-bit wide mremap() "flag" to set the upper limit above given base
Essentially more placement support for mremap()/mdup(). It's not clear
to me those particular semantics are the ideal ones. A target range
for placement should do, if not manual address space management.
On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:i. support the prot argument to remap_file_pages
This is probably going to happen anyway.
On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:j. a documented way (madvise?) to punch same-VMA zero-page holes
This is MADV_REMOVE, though most filesystems don't support it. Do you
need it for more than tmpfs?