Re: [RFC PATCH 2/7] x86/sci: add core implementation for system call isolation
From: Andy Lutomirski
Date: Fri Apr 26 2019 - 15:22:09 EST
> On Apr 26, 2019, at 11:49 AM, James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, 2019-04-26 at 10:40 -0700, Andy Lutomirski wrote:
>>> On Apr 26, 2019, at 8:19 AM, James Bottomley <James.Bottomley@hanse
>>> npartnership.com> wrote:
>>>
>>> On Fri, 2019-04-26 at 08:07 -0700, Andy Lutomirski wrote:
>>>>> On Apr 26, 2019, at 7:57 AM, James Bottomley
>>>>> <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
>>>>>
>>>>>>> On Fri, 2019-04-26 at 07:46 -0700, Dave Hansen wrote:
>>>>>>> On 4/25/19 2:45 PM, Mike Rapoport wrote:
>>>>>>> After the isolated system call finishes, the mappings
>>>>>>> created during its execution are cleared.
>>>>>>
>>>>>> Yikes. I guess that stops someone from calling write() a
>>>>>> bunch of times on every filesystem using every block device
>>>>>> driver and all the DM code to get a lot of code/data faulted
>>>>>> in. But, it also means not even long-running processes will
>>>>>> ever have a chance of behaving anything close to normally.
>>>>>>
>>>>>> Is this something you think can be rectified or is there
>>>>>> something fundamental that would keep SCI page tables from
>>>>>> being cached across different invocations of the same
>>>>>> syscall?
>>>>>
>>>>> There is some work being done to look at pre-populating the
>>>>> isolated address space with the expected execution footprint of
>>>>> the system call, yes. It lessens the ROP gadget protection
>>>>> slightly because you might find a gadget in the pre-populated
>>>>> code, but it solves a lot of the overhead problem.
>>>>
>>>> Iâm not even remotely a ROP expert, but: what stops a ROP payload
>>>> from using all the âfault-inâ gadgets that exist â any function
>>>> that can return on an error without doing to much will fault in
>>>> the whole page containing the function.
>>>
>>> The address space pre-population is still per syscall, so you don't
>>> get access to the code footprint of a different syscall. So the
>>> isolated address space is created anew for every system call, it's
>>> just pre-populated with that system call's expected footprint.
>>
>> Thatâs not what I mean. Suppose I want to use a ROP gadget in
>> vmalloc(), but vmalloc isnât in the page tables. Then first push
>> vmalloc itself into the stack. As long as RDI contains a sufficiently
>> ridiculous value, it should just return without doing anything. And
>> it can return right back into the ROP gadget, which is now available.
>
> Yes, it's not perfect, but stack space for a smashing attack is at a
> premium and now you need two stack frames for every gadget you chain
> instead of one so we've halved your ability to chain gadgets.
>
>>>> To improve this, we would want some thing that would try to check
>>>> whether the caller is actually supposed to call the callee, which
>>>> is more or less the hard part of CFI. So canât we just do CFI
>>>> and call it a day?
>>>
>>> By CFI you mean control flow integrity? In theory I believe so,
>>> yes, but in practice doesn't it require a lot of semantic object
>>> information which is easy to get from higher level languages like
>>> java but a bit more difficult for plain C.
>>
>> Yes. As I understand it, grsecurity instruments gcc to create some
>> kind of hash of all function signatures. Then any indirect call can
>> effectively verify that itâs calling a function of the right type.
>> And every return verified a cookie.
>>
>> On CET CPUs, RET gets checked directly, and I donât see the benefit
>> of SCI.
>
> Presumably you know something I don't but I thought CET CPUs had been
> planned for release for ages, but not actually released yet?
I donât know any secrets about this, but I donât think itâs released. Last I checked, it didnât even have a final public spec.
>
>>>> On top of that, a robust, maintainable implementation of this
>>>> thing seems very complicated â for example, what happens if
>>>> vfree() gets called?
>>>
>>> Address space Local vs global object tracking is another thing on
>>> our list. What we'd probably do is verify the global object was
>>> allowed to be freed and then hand it off safely to the main kernel
>>> address space.
>>
>> This seems exceedingly complicated.
>
> It's a research project: we're exploring what's possible so we can
> choose the techniques that give the best security improvement for the
> additional overhead.
>
:)