Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

From: Kees Cook
Date: Mon Nov 17 2014 - 15:28:06 EST

On Sun, Nov 16, 2014 at 3:44 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> On Fri, 14 Nov 2014, Kees Cook wrote:
>> On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu <yinghai@xxxxxxxxxx> wrote:
>> > should use attached one instead.
>> >
>> > 1. should use _brk_end instead of &end, as we only use partial of
>> > brk.
>> > 2. [_brk_end, pm_end) page range is already converted. aka
>> > is not wasted.
>> Are you sure? For me, _brk_end isn't far enough:
>> [ 1.475572] all_end: 0xffffffff82df5000
>> [ 1.476736] _brk_end: 0xffffffff82dd6000
> _brk_end is adjusted at boot time via extend_brk() up to __brk_limit,
> which is the same as _end. We usually do not use all of that space. So
> it's expected that _brk_end < _end.
>> Is this correct? It sounded like tglx wanted the pmd split, like this:
> Yes, I wanted to get rid of the high mapping for anything between
> _brk_end and _end, and I brought you on the wrong track with my
> suggestion to call free_init_pages(). Sorry about that.
> That happened because I missed the completely non obvious fact, that
> only the effective brk section is reserved for the kernel via
> reserve_brk(). So the area between _brk_end and _end is already
> reusable. Though that reuse works only by chance and not by design and
> is completely undocumented as everything else in that area.
> So the initial patch to get rid of the X mapping is of course to just
> extend the area to the PMD. A little bit different to your initial
> patch, but essentially the same.
> - unsigned long all_end = PFN_ALIGN(&_end);
> + unsigned long all_end = roundup((unsigned long) &_end, PMD_SIZE);
> I'm going to apply your V1 patch with the above roundup()
> simplification. If a page of that area gets used later on then we are
> going to split up the PMD anyway.

That's fine by me. Yinghai Lu seems to have potentially better
solutions, but my head is spinning after reading more of this thread.
:) I just want to make sure that at the end of the day, there are no
RW+x mappings.

> But still we want to get rid of that highmap between _brk_end and
> _end, but there is absolutely no reason to come up with extra silly
> functions for that.
> So the obvious solution is to let setup_arch() reserve the memory up
> to _end instead of _bss_stop, get rid of the extra reservation in
> reserve_brk() and then let free_initmem() release the area between
> _brk_end and _end. No extra hackery, no side effects, just works.
> I spent quite some time to stare into that and I wonder about the
> following related issues:
> 1) Why is the mark_rodata_ro() business a debug configuration, i.e
> This does not make any sense at all. We really want RO and NX on by
> default and AFAICT distros are turning that on anyway for obvious
> reasons.

This is a historically badly named config item. Once arm and arm64
land their CONFIG_DEBUG_RODATA implementations, I was going to suggest
having this renamed.

> The only idiocity I found so far is the kgdb Documentation which
> recommends to turn it off. Sigh.
> So that should be changed to:
> config ARCH_HAS_RONX
> bool
> plus
> bool
> bool "WTF?"
> 2) What is actually the modules counterpart for mark_rodata_ro()?
> Of course not enabled by default, but enabled by distros again.
> See #1.
> Now what's interesting aside of the general fuckup is that
> CONFIG_DEBUG_RODATA is supported by:
> arch/x86 and arch/parisc
> But CONFIG_DEBUG_SET_MODULE_RONX is supported by:
> arch/arm/ arch/arm64 arch/s390 and arch/x86

I have a series for arm that is waiting to be picked up by rmk:

Laura has been working on a series for arm64:

So what you're seeing is us being in the middle of providing support
for it. It just happens that CONFIG_DEBUG_SET_MODULE_RONX is a bit
easier to implement, so it was completed first.

> This does not make any sense at all.
> Do arm/arm64/s390 have other means to make RO/NX work or are they
> just doing it for modules? And how is that supposed to work with
> KGDB if it is not aware of modules sections being RO/NX? KGDB has
> only extra magic for CONFIG_DEBUG_RODATA, but not for
> Now for extended fun the x86 help text for that option says:
> ... Such protection may interfere with run-time code
> patching and dynamic kernel tracing - and they might also protect
> against certain classes of kernel exploits.
> If in doubt, say "N".
> Patently wrong. More sigh.

Like the CONFIG naming, I hope to start cleaning these defaults up
once arm and arm64 are landed.

> 3) Why is mark_rodata_ro() called AFTER free_initmem() and therefor
> cannot be marked __init ?
> Just because ...

That's worth examining. Since most of the logic that does the ro/nx
work needs to stick around for things like ftrace, kgdb, etc, it's
possible there wouldn't be much savings from making mark_rodata_ro as
__init. I'll add this to my list to check.



Kees Cook
Chrome OS Security
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at