Re: [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor
From: Mickaël Salaün
Date: Wed Aug 19 2020 - 14:54:33 EST
On 12/08/2020 12:06, Mark Rutland wrote:
> On Thu, Aug 06, 2020 at 12:26:02PM -0500, Madhavan T. Venkataraman wrote:
>> Thanks for the lively discussion. I have tried to answer some of the
>> comments below.
>>
>> On 8/4/20 9:30 AM, Mark Rutland wrote:
>>>
>>>> So, the context is - if security settings in a system disallow a page to have
>>>> both write and execute permissions, how do you allow the execution of
>>>> genuine trampolines that are runtime generated and placed in a data
>>>> page or a stack page?
>>> There are options today, e.g.
>>>
>>> a) If the restriction is only per-alias, you can have distinct aliases
>>> where one is writable and another is executable, and you can make it
>>> hard to find the relationship between the two.
>>>
>>> b) If the restriction is only temporal, you can write instructions into
>>> an RW- buffer, transition the buffer to R--, verify the buffer
>>> contents, then transition it to --X.
>>>
>>> c) You can have two processes A and B where A generates instrucitons into
>>> a buffer that (only) B can execute (where B may be restricted from
>>> making syscalls like write, mprotect, etc).
>>
>> The general principle of the mitigation is W^X. I would argue that
>> the above options are violations of the W^X principle. If they are
>> allowed today, they must be fixed. And they will be. So, we cannot
>> rely on them.
>
> Hold on.
>
> Contemporary W^X means that a given virtual alias cannot be writeable
> and executeable simultaneously, permitting (a) and (b). If you read the
> references on the Wikipedia page for W^X you'll see the OpenBSD 3.3
> release notes and related presentation make this clear, and further they
> expect (b) to occur with JITS flipping W/X with mprotect().
W^X (with "permanent" mprotect restrictions [1]) goes back to 2000 with
PaX [2] (which predates partial OpenBSD implementation from 2003).
[1] https://pax.grsecurity.net/docs/mprotect.txt
[2] https://undeadly.org/cgi?action=article;sid=20030417082752
>
> Please don't conflate your assumed stronger semantics with the general
> principle. It not matching you expectations does not necessarily mean
> that it is wrong.
>
> If you want a stronger W^X semantics, please refer to this specifically
> with a distinct name.
>
>> a) This requires a remap operation. Two mappings point to the same
>> physical page. One mapping has W and the other one has X. This
>> is a violation of W^X.
>>
>> b) This is again a violation. The kernel should refuse to give execute
>> permission to a page that was writeable in the past and refuse to
>> give write permission to a page that was executable in the past.
>>
>> c) This is just a variation of (a).
>
> As above, this is not true.
>
> If you have a rationale for why this is desirable or necessary, please
> justify that before using this as justification for additional features.
>
>> In general, the problem with user-level methods to map and execute
>> dynamic code is that the kernel cannot tell if a genuine application is
>> using them or an attacker is using them or piggy-backing on them.
>
> Yes, and as I pointed out the same is true for trampfd unless you can
> somehow authenticate the calls are legitimate (in both callsite and the
> set of arguments), and I don't see any reasonable way of doing that.
>
> If you relax your threat model to an attacker not being able to make
> arbitrary syscalls, then your suggestion that userspace can perorm
> chceks between syscalls may be sufficient, but as I pointed out that's
> equally true for a sealed memfd or similar.
>
>> Off the top of my head, I have tried to identify some examples
>> where we can have more trust on dynamic code and have the kernel
>> permit its execution.
>>
>> 1. If the kernel can do the job, then that is one safe way. Here, the kernel
>> is the code. There is no code generation involved. This is what I
>> have presented in the patch series as the first cut.
>
> This is sleight-of-hand; it doesn't matter where the logic is performed
> if the power is identical. Practically speaking this is equivalent to
> some dynamic code generation.
>
> I think that it's misleading to say that because the kernel emulates
> something it is safe when the provenance of the syscall arguments cannot
> be verified.
>
> [...]
>
>> Anyway, these are just examples. The principle is - if we can identify
>> dynamic code that has a certain measure of trust, can the kernel
>> permit their execution?
>
> My point generally is that the kernel cannot identify this, and if
> usrspace code is trusted to dynamically generate trampfd arguments it
> can equally be trusted to dyncamilly generate code.
>
> [...]
>
>> As I have mentioned above, I intend to have the kernel generate code
>> only if the code generation is simple enough. For more complicated cases,
>> I plan to use a user-level code generator that is for exclusive kernel use.
>> I have yet to work out the details on how this would work. Need time.
>
> This reads to me like trampfd is only dealing with a few special cases
> and we know that we need a more general solution.
>
> I hope I am mistaken, but I get the strong impression that you're trying
> to justify your existing solution rather than trying to understand the
> problem space.
>
> To be clear, my strong opinion is that we should not be trying to do
> this sort of emulation or code generation within the kernel. I do think
> it's worthwhile to look at mechanisms to make it harder to subvert
> dynamic userspace code generation, but I think the code generation
> itself needs to live in userspace (e.g. for ABI reasons I previously
> mentioned).
>
> Mark.
>