Re: [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor

From: Madhavan T. Venkataraman
Date: Sun Aug 02 2020 - 14:54:45 EST

More responses inline..

On 7/28/20 12:31 PM, Andy Lutomirski wrote:
>> On Jul 28, 2020, at 6:11 AM, madvenka@xxxxxxxxxxxxxxxxxxx wrote:
>> ïFrom: "Madhavan T. Venkataraman" <madvenka@xxxxxxxxxxxxxxxxxxx>
> 2. Use existing kernel functionality. Raise a signal, modify the
> state, and return from the signal. This is very flexible and may not
> be all that much slower than trampfd.

Let me understand this. You are saying that the trampoline code
would raise a signal and, in the signal handler, set up the context
so that when the signal handler returns, we end up in the target
function with the context correctly set up. And, this trampoline code
can be generated statically at build time so that there are no
security issues using it.

Have I understood your suggestion correctly?

So, my argument would be that this would always incur the overhead
of a trip to the kernel. I think twice the overhead if I am not mistaken.
With trampfd, we can have the kernel generate the code so that there
is no performance penalty at all.

Signals have many problems. Which signal number should we use for this
purpose? If we use an existing one, that might conflict with what the application
is already handling. Getting a new signal number for this could meet
with resistance from the community.

Also, signals are asynchronous. So, they are vulnerable to race conditions.
To prevent other signals from coming in while handling the raised signal,
we would need to block and unblock signals. This will cause more

> 3. Use a syscall. Instead of having the kernel handle page faults,
> have the trampoline code push the syscall nr register, load a special
> new syscall nr into the syscall nr register, and do a syscall. On
> x86_64, this would be:
> pushq %rax
> movq __NR_magic_trampoline, %rax
> syscall
> with some adjustment if the stack slot you're clobbering is important.

How is this better than the kernel handling an address fault?
The system call still needs to do the same work as the fault handler.
We do need to specify the register and stack contexts before hand
so the system call can do its job.

Also, this always incurs a trip to the kernel. With trampfd, the kernel
could generate the code to avoid the performance penalty.

> Also, will using trampfd cause issues with various unwinders? I can
> easily imagine unwinders expecting code to be readable, although this
> is slowly going away for other reasons.

I need to study unwinders a little before I respond to this question.
So, bear with me.

> All this being said, I think that the kernel should absolutely add a
> sensible interface for JITs to use to materialize their code. This
> would integrate sanely with LSMs and wouldn't require hacks like using
> files, etc. A cleverly designed JIT interface could function without
> seriailization IPIs, and even lame architectures like x86 could
> potentially avoid shootdown IPIs if the interface copied code instead
> of playing virtual memory games. At its very simplest, this could be:
> void *jit_create_code(const void *source, size_t len);
> and the result would be a new anonymous mapping that contains exactly
> the code requested. There could also be:
> int jittfd_create(...);
> that does something similar but creates a memfd. A nicer
> implementation for short JIT sequences would allow appending more code
> to an existing JIT region. On x86, an appendable JIT region would
> start filled with 0xCC, and I bet there's a way to materialize new
> code into a previously 0xcc-filled virtual page wthout any
> synchronization. One approach would be to start with:
> <some code>
> 0xcc
> 0xcc
> ...
> 0xcc
> and to create a whole new page like:
> <some code>
> <some more code>
> 0xcc
> ...
> 0xcc
> so that the only difference is that some code changed to some more
> code. Then replace the PTE to swap from the old page to the new page,
> and arrange to avoid freeing the old page until we're sure it's gone
> from all TLBs. This may not work if <some more code> spans a page
> boundary. The #BP fixup would zap the TLB and retry. Even just
> directly copying code over some 0xcc bytes almost works, but there's a
> nasty corner case involving instructions that fetch I$ fetch
> boundaries. I'm not sure to what extent I$ snooping helps.

I am thinking that the trampfd API can be used for addressing JIT
code as well. I have not yet started thinking about the details. But I
think the API is sufficient. E.g.,

ÂÂÂ struct trampfd_jit {
ÂÂÂ ÂÂÂ voidÂÂÂ *source;
ÂÂÂ ÂÂÂ size_tÂÂÂ len;
ÂÂÂ };

ÂÂÂ struct trampfd_jitÂÂÂ jit;
ÂÂÂ struct trampfd_mapÂÂÂ map;
ÂÂÂ voidÂÂÂ *addr;

ÂÂÂ jit.source = blah;
ÂÂÂ jit.size = blah;

ÂÂÂ fd = syscall(440, TRAMPFD_JIT, &jit, flags);
ÂÂÂ pread(fd, &map, sizeof(map), TRAMPFD_MAP_OFFSET);
ÂÂÂ addr = mmap(NULL, map.size, map.prot, map.flags, fd, map.offset);

And addr would be used to invoke the generated JIT code.