Re: JIT emulator needs

From: Albert Cahalan
Date: Wed Jun 20 2007 - 14:50:40 EST


On 6/20/07, William Lee Irwin III <wli@xxxxxxxxxxxxxx> wrote:
On 6/19/07, William Lee Irwin III <wli@xxxxxxxxxxxxxx> wrote:
If the policy forbidding self-modifying code lacks a method of
exempting programs such as JIT interpreters (which I doubt) then
it's a problem. I'm with Alan on this one.

On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote:
It does and it doesn't. There is not a reasonable way for a
user to mark an app as needing full self-modifying ability.
It's not like the executable stack, which can be set via the
ELF note markings on the executable. (ELF note markings are
ideal because they can not be used via a ret-to-libc attack)
With admin privs, one can change SE Linux settings. Mark the
executable, disable the protection system-wide, generate a
completely new SE Linux policy, or just turn SE Linux off.
Normally we don't expect/require admin privs to install an
executable in one's own ~/bin directory. This is broken.
It ought to be easier to get a JIT working well without
enabling arbitrary mprotect. This would allow a JIT to
partially benefit from the recent security enhancements.
(think of all the buggy browser-based JIT things!)

I presumed an ELF note or extended filesystem attributes were already
in place for this sort of affair. It may be that the model implemented
is so restrictive that users are forbidden to create new executables,
in which case using a different model is certainly in order. Otherwise
the ELF note or attributes need to be implemented.

Users can create executables. Some will be non-functional
unless specially marked by an admin.

What is the goal here? I see no reasonable goal that would
result in such a policy.

On 6/19/07, William Lee Irwin III <wli@xxxxxxxxxxxxxx> wrote:
This sort of logic might be appropriate for a sort of parametrized
and specialized vma allocator setting the policy in /proc/ along
with various sorts of limits. There are limits to such and at some
point things will have to manually manage their own process address
spaces in a platform-specific fashion. If kernel assistance here is
rejected they may have to do so in all cases.

On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote:
I prefer ELF notes (for start-up allocations) and prctl,
plus a mmap flag for per-allocation behavior.

Beware that the kernel (upstream of me) will likely refuse to support
to exotic mmap() placement policies. At that point userspace will have
to implement them itself with a front-end to mmap().

Userspace can actually live without kernel placement support for
everything but the executable itself, which is already implemented via
ELF loading standards. This is not to downplay the tremendous amounts
of pain involved for moving the stack, getting ld.so to land in the
right place, and so on. Actually I'm less sure about .interp placement.
In any event, exotic virtualspace allocation policies are largely yet
another "simple matter of programming" implementable entirely in
userspace.

When you go that route, you may need to abandon libc. I've done exactly
that for one emulator. It was not easy. Nearly nobody will want to go
down that path.

Things improve a bit if MAP_ANONYMOUS and SysV shared mem allocations
can be made to ignore the available memory checking. If I could allocate
a 2 GB chunk on a system with 1 GB total swap+RAM, then I could use
that as an area in which to perform MAP_FIXED allocations. As of now
this would require either adding the swap space or disabling the
available memory checking system-wide via sysctl.

On 6/19/07, William Lee Irwin III <wli@xxxxxxxxxxxxxx> wrote:
This is a bad idea. The standard semantics are needed for programs
relying upon them.

On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote:
I didn't mean that the default default :-) setting would change.
I meant that people could change the behavior from a boot script.
Things that break are really foul and nasty anyway, probably with
serious problems that ought to get fixed.

It's actually not a good idea to make it the default even via sysctl.
People won't realize something will break until it does, and what will
break is likely to be a database responsible for data integrity. The
IPC_RMID creation flag should suffice.

It's highly unlikely that such breakage would cause corruption.
Most likely it would cause the database to exit with an error
about failing to attach to a SysV shared memory segment.

I believe that a major cause of reboots is that admins are
unaware of SysV shared memory cruft left behind by apps that
crashed at the wrong moment or had other bugs. If something
is eating memory and you don't know what it is, you reboot.

On 6/19/07, William Lee Irwin III <wli@xxxxxxxxxxxxxx> wrote:
This is MADV_REMOVE, though most filesystems don't support it. Do you
need it for more than tmpfs?

On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote:
Yes and no. It's painful to be restricted to one backing store.
Covering MAP_ANONYMOUS and SysV shared mem is most critical.
I suppose that other filesystems may require multiple flags to
deal with the desire to (not) punch a hole on disk and what to
do if that isn't possible.

If those two are the bare necessities, they're already in place.

Well NONE of this stuff is absolutely required to run a JIT,
and one doesn't even need a JIT if one likes pure emulation.
All of this is about optimization and failure clean-up.

MAP_ANONYMOUS and SysV shared mem are good for transient things.
Sometimes a JIT author wants to keep a persistent image on disk.
In this case, it is much better to use the disk as backing store.
Also, sometimes one prefers to use a specific filesystem because
swap may be slower, smaller, or of unknown quality.

BTW, a mdup2 is great for DSP algorithms as well. It can allow
for wrap-around arrays, greatly simplifying and speeding up
things like filters.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/