Re: [kernel-hardening] Re: [RFC v2][PATCH 04/11] x86: Implement __arch_rare_write_begin/unmap()

From: Andy Lutomirski
Date: Fri Apr 07 2017 - 12:15:07 EST


On Fri, Apr 7, 2017 at 6:30 AM, Mathias Krause <minipli@xxxxxxxxxxxxxx> wrote:
> On 7 April 2017 at 15:14, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>> On Fri, 7 Apr 2017, Mathias Krause wrote:
>>> On 7 April 2017 at 11:46, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>>> > Whether protected by preempt_disable or local_irq_disable, to make that
>>> > work it needs CR0 handling in the exception entry/exit at the lowest
>>> > level. And that's just a nightmare maintainence wise as it's prone to be
>>> > broken over time.
>>>
>>> It seems to be working fine for more than a decade now in PaX. So it
>>> can't be such a big maintenance nightmare ;)
>>
>> I really do not care whether PaX wants to chase and verify that over and
>> over. I certainly don't want to take the chance to leak CR0.WP ever and I
>> very much care about extra stuff to check in the entry/exit path.
>
> Fair enough. However, placing a BUG_ON(!(read_cr0() & X86_CR0_WP))
> somewhere sensible should make those "leaks" visible fast -- and their
> exploitation impossible, i.e. fail hard.

The leaks surely exist and now we'll just add an exploitable BUG.

>>> > It's valid (at least on x86) to have a shadow map with the same page
>>> > attributes but write enabled. That does not require any fixups of CR0 and
>>> > just works.
>>>
>>> "Just works", sure -- but it's not as tightly focused as the PaX
>>> solution which is CPU local, while your proposed solution is globally
>>> visible.
>>
>> Making the world and some more writeable hardly qualifies as tightly
>> focussed. Making the mapping concept CPU local is not rocket science
>> either. The question is whethers it's worth the trouble.
>
> No, the question is if the value of the concept is well understood and
> if people can see what could be done with such a strong primitive.
> Apparently not...

I think we're approaching this all wrong, actually. The fact that x86
has this CR0.WP thing is arguably a historical accident, and the fact
that PaX uses it doesn't mean that PaX is doing it the best way for
upstream Linux.

Why don't we start at the other end and do a generic non-arch-specific
implementation: set up an mm_struct that contains an RW alias of the
relevant parts of rodata and use use_mm to access it. (That is,
get_fs() to back up the old fs, set_fs(USER_DS),
use_mm(&rare_write_mm), do the write using copy_to_user, undo
everything.)

Then someone who cares about performance can benchmark the CR0.WP
approach against it and try to argue that it's a good idea. This
benchmark should wait until I'm done with my PCID work, because PCID
is going to make use_mm() a whole heck of a lot faster.

--Andy