Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

From: Thomas Gleixner
Date: Sun Nov 16 2014 - 18:44:38 EST

On Fri, 14 Nov 2014, Kees Cook wrote:
> On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu <yinghai@xxxxxxxxxx> wrote:
> > should use attached one instead.
> >
> > 1. should use _brk_end instead of &end, as we only use partial of
> > brk.
> > 2. [_brk_end, pm_end) page range is already converted. aka
> > is not wasted.
> Are you sure? For me, _brk_end isn't far enough:
> [ 1.475572] all_end: 0xffffffff82df5000
> [ 1.476736] _brk_end: 0xffffffff82dd6000

_brk_end is adjusted at boot time via extend_brk() up to __brk_limit,
which is the same as _end. We usually do not use all of that space. So
it's expected that _brk_end < _end.

> Is this correct? It sounded like tglx wanted the pmd split, like this:

Yes, I wanted to get rid of the high mapping for anything between
_brk_end and _end, and I brought you on the wrong track with my
suggestion to call free_init_pages(). Sorry about that.

That happened because I missed the completely non obvious fact, that
only the effective brk section is reserved for the kernel via
reserve_brk(). So the area between _brk_end and _end is already
reusable. Though that reuse works only by chance and not by design and
is completely undocumented as everything else in that area.

So the initial patch to get rid of the X mapping is of course to just
extend the area to the PMD. A little bit different to your initial
patch, but essentially the same.

- unsigned long all_end = PFN_ALIGN(&_end);
+ unsigned long all_end = roundup((unsigned long) &_end, PMD_SIZE);

I'm going to apply your V1 patch with the above roundup()
simplification. If a page of that area gets used later on then we are
going to split up the PMD anyway.

But still we want to get rid of that highmap between _brk_end and
_end, but there is absolutely no reason to come up with extra silly
functions for that.

So the obvious solution is to let setup_arch() reserve the memory up
to _end instead of _bss_stop, get rid of the extra reservation in
reserve_brk() and then let free_initmem() release the area between
_brk_end and _end. No extra hackery, no side effects, just works.

I spent quite some time to stare into that and I wonder about the
following related issues:

1) Why is the mark_rodata_ro() business a debug configuration, i.e

This does not make any sense at all. We really want RO and NX on by
default and AFAICT distros are turning that on anyway for obvious

The only idiocity I found so far is the kgdb Documentation which
recommends to turn it off. Sigh.

So that should be changed to:





bool "WTF?"

2) What is actually the modules counterpart for mark_rodata_ro()?


Of course not enabled by default, but enabled by distros again.

See #1.

Now what's interesting aside of the general fuckup is that
CONFIG_DEBUG_RODATA is supported by:

arch/x86 and arch/parisc


arch/arm/ arch/arm64 arch/s390 and arch/x86

This does not make any sense at all.

Do arm/arm64/s390 have other means to make RO/NX work or are they
just doing it for modules? And how is that supposed to work with
KGDB if it is not aware of modules sections being RO/NX? KGDB has
only extra magic for CONFIG_DEBUG_RODATA, but not for

Now for extended fun the x86 help text for that option says:

... Such protection may interfere with run-time code
patching and dynamic kernel tracing - and they might also protect
against certain classes of kernel exploits.
If in doubt, say "N".

Patently wrong. More sigh.

3) Why is mark_rodata_ro() called AFTER free_initmem() and therefor
cannot be marked __init ?

Just because ...


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at