Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
From: Thomas Gleixner
Date: Sun Nov 16 2014 - 18:44:38 EST
On Fri, 14 Nov 2014, Kees Cook wrote:
> On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu <yinghai@xxxxxxxxxx> wrote:
> > should use attached one instead.
> >
> > 1. should use _brk_end instead of &end, as we only use partial of
> > brk.
> > 2. [_brk_end, pm_end) page range is already converted. aka
> > is not wasted.
>
> Are you sure? For me, _brk_end isn't far enough:
>
> [ 1.475572] all_end: 0xffffffff82df5000
> [ 1.476736] _brk_end: 0xffffffff82dd6000
_brk_end is adjusted at boot time via extend_brk() up to __brk_limit,
which is the same as _end. We usually do not use all of that space. So
it's expected that _brk_end < _end.
> Is this correct? It sounded like tglx wanted the pmd split, like this:
Yes, I wanted to get rid of the high mapping for anything between
_brk_end and _end, and I brought you on the wrong track with my
suggestion to call free_init_pages(). Sorry about that.
That happened because I missed the completely non obvious fact, that
only the effective brk section is reserved for the kernel via
reserve_brk(). So the area between _brk_end and _end is already
reusable. Though that reuse works only by chance and not by design and
is completely undocumented as everything else in that area.
So the initial patch to get rid of the X mapping is of course to just
extend the area to the PMD. A little bit different to your initial
patch, but essentially the same.
- unsigned long all_end = PFN_ALIGN(&_end);
+ unsigned long all_end = roundup((unsigned long) &_end, PMD_SIZE);
I'm going to apply your V1 patch with the above roundup()
simplification. If a page of that area gets used later on then we are
going to split up the PMD anyway.
But still we want to get rid of that highmap between _brk_end and
_end, but there is absolutely no reason to come up with extra silly
functions for that.
So the obvious solution is to let setup_arch() reserve the memory up
to _end instead of _bss_stop, get rid of the extra reservation in
reserve_brk() and then let free_initmem() release the area between
_brk_end and _end. No extra hackery, no side effects, just works.
I spent quite some time to stare into that and I wonder about the
following related issues:
1) Why is the mark_rodata_ro() business a debug configuration, i.e
CONFIG_DEBUG_RODATA?
This does not make any sense at all. We really want RO and NX on by
default and AFAICT distros are turning that on anyway for obvious
reasons.
The only idiocity I found so far is the kgdb Documentation which
recommends to turn it off. Sigh.
So that should be changed to:
config ARCH_HAS_RONX
bool
config DISABLE_RONX
def_bool !ARCH_HAS_RONX || !KGDB_ENABLE_SECURITY_HOLES
plus
config ARCH_WANTS_KGDB_SECURITY_HOLES
bool
config KGDB_ENABLE_SECURITY_HOLES
bool "WTF?"
depends on BROKEN && ARCH_WANTS_KGDB_SECURITY_HOLES
2) What is actually the modules counterpart for mark_rodata_ro()?
CONFIG_DEBUG_SET_MODULE_RONX
Of course not enabled by default, but enabled by distros again.
See #1.
Now what's interesting aside of the general fuckup is that
CONFIG_DEBUG_RODATA is supported by:
arch/x86 and arch/parisc
But CONFIG_DEBUG_SET_MODULE_RONX is supported by:
arch/arm/ arch/arm64 arch/s390 and arch/x86
This does not make any sense at all.
Do arm/arm64/s390 have other means to make RO/NX work or are they
just doing it for modules? And how is that supposed to work with
KGDB if it is not aware of modules sections being RO/NX? KGDB has
only extra magic for CONFIG_DEBUG_RODATA, but not for
CONFIG_DEBUG_SET_MODULE_RONX.
Now for extended fun the x86 help text for that option says:
... Such protection may interfere with run-time code
patching and dynamic kernel tracing - and they might also protect
against certain classes of kernel exploits.
If in doubt, say "N".
Patently wrong. More sigh.
3) Why is mark_rodata_ro() called AFTER free_initmem() and therefor
cannot be marked __init ?
Just because ...
Thanks,
tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/