Re: JIT emulator needs

From: Albert Cahalan
Date: Tue Jun 19 2007 - 23:16:39 EST


On 6/19/07, William Lee Irwin III <wli@xxxxxxxxxxxxxx> wrote:
On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:

Right now, Linux isn't all that friendly to JIT emulators.
Here are the problems and suggestions to improve the situation.
There is an SE Linux execmem restriction that enforces W^X.
Assuming you don't wish to just disable SE Linux, there are
two ugly ways around the problem. You can mmap a file twice,
or you can abuse SysV shared memory. The mmap method requires
that you know of a filesystem mounted rw,exec where you can
write a very large temporary file. This arbitrary filesystem,
rather than swap space, will be the backing store. The SysV
shared memory method requires an undocumented flag and is
subject to some annoying size limits. Both methods create
objects that will fail to be deleted if the program dies
before marking the objects for deletion.

If the policy forbidding self-modifying code lacks a method of
exempting programs such as JIT interpreters (which I doubt) then
it's a problem. I'm with Alan on this one.

It does and it doesn't. There is not a reasonable way for a
user to mark an app as needing full self-modifying ability.
It's not like the executable stack, which can be set via the
ELF note markings on the executable. (ELF note markings are
ideal because they can not be used via a ret-to-libc attack)

With admin privs, one can change SE Linux settings. Mark the
executable, disable the protection system-wide, generate a
completely new SE Linux policy, or just turn SE Linux off.

Normally we don't expect/require admin privs to install an
executable in one's own ~/bin directory. This is broken.

It ought to be easier to get a JIT working well without
enabling arbitrary mprotect. This would allow a JIT to
partially benefit from the recent security enhancements.
(think of all the buggy browser-based JIT things!)

On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
Processors often have annoying limits on the immediate values
in instructions. An x86 or x86_64 JIT can go a bit faster if
all allocations are kept to the low 2 GB of address space.
There are also reasons for a 32bit-to-x86_64 JIT to chose
a nearly arbitrary 2 GB region that lies above 4 GB.
Other archs have other limits, such as 32 MB or 256 MB.

This sort of logic might be appropriate for a sort of parametrized
and specialized vma allocator setting the policy in /proc/ along
with various sorts of limits. There are limits to such and at some
point things will have to manually manage their own process address
spaces in a platform-specific fashion. If kernel assistance here is
rejected they may have to do so in all cases.

I prefer ELF notes (for start-up allocations) and prctl,
plus a mmap flag for per-allocation behavior.

On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
Additions to better support JIT emulators:
a. sysctl to set IPC_RMID by default

This is a bad idea. The standard semantics are needed for programs
relying upon them.

I didn't mean that the default default :-) setting would change.
I meant that people could change the behavior from a boot script.
Things that break are really foul and nasty anyway, probably with
serious problems that ought to get fixed.

On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
c. open() flag to unlink a file before returning the fd

You probably want a tmpfile(3) -like affair which never has a pathname
to begin with. It could be useful for security purposes more generally.

Yes, exactly. I think there are some possible optimizations
available too, particularly with the cifs filesystem.

On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
d. mremap() flag to always keep the old mapping

This sounds vaguely like another syscall, like mdup(). This is
particularly meaningful in the context of anonymous memory, for
which there is no method of replicating mappings within a single
process address space.

Yes, mdup() and probably mdup2(). It could be mremap flags or not.

JIT emulators generally need a second mapping so that they can
have both read/write and execute for the same physical memory.

It is somewhat tolerable to have SE Linux enforce that the second
mapping be randomized. (it helps security greatly, but slows the
emulator by a tiny bit)

On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
e. mremap() flag to get a read/write mapping of a read/exec one
f. mremap() flag to get a read/exec mapping of a read/write one

Presumably to be used in conjunction with keeping the old mapping.
A composite mdup()/mremap() and mprotect(), presumably saving a TLB
flush or other sorts of overhead, may make some sort of sense here.
Odds are it'll get rejected as the sequence of syscalls is a rather
precise equivalent, though it would optimize things (as would other
composite syscalls, e.g. ones combining fork() and execve() etc.).

A few mremap flags ought to do the job I think.

On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
g. mremap() flag to make the 5th arg (new addr) be the upper limit
h. 6-bit wide mremap() "flag" to set the upper limit above given base

Essentially more placement support for mremap()/mdup(). It's not clear
to me those particular semantics are the ideal ones. A target range
for placement should do, if not manual address space management.

Yes. I'm looking for the change that will help JIT emulators
the most while hurting security the least.

On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
i. support the prot argument to remap_file_pages

This is probably going to happen anyway.

Great.

On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
j. a documented way (madvise?) to punch same-VMA zero-page holes

This is MADV_REMOVE, though most filesystems don't support it. Do you
need it for more than tmpfs?

Yes and no. It's painful to be restricted to one backing store.
Covering MAP_ANONYMOUS and SysV shared mem is most critical.
I suppose that other filesystems may require multiple flags to
deal with the desire to (not) punch a hole on disk and what to
do if that isn't possible.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/