Re: [PATCH] arm: Use kernel mm when updating section permissions

From: Laura Abbott
Date: Fri Nov 06 2015 - 13:44:41 EST

On 11/05/2015 05:15 PM, Kees Cook wrote:
On Thu, Nov 5, 2015 at 5:05 PM, Laura Abbott <labbott@xxxxxxxxxx> wrote:
On 11/05/2015 08:27 AM, Russell King - ARM Linux wrote:

On Thu, Nov 05, 2015 at 08:20:42AM -0800, Laura Abbott wrote:

On 11/05/2015 01:46 AM, Russell King - ARM Linux wrote:

On Wed, Nov 04, 2015 at 05:00:39PM -0800, Laura Abbott wrote:

Currently, read only permissions are not being applied even
when CONFIG_DEBUG_RODATA is set. This is because section_update
uses current->mm for adjusting the page tables. current->mm
need not be equivalent to the kernel version. Use pgd_offset_k
to get the proper page directory for updating.

What are you trying to achieve here? You can't use these functions
at run time (after the first thread has been spawned) to change
permissions, because there will be multiple copies of the kernel
section mappings, and those copies will not get updated.

In any case, this change will probably break kexec and ftrace, as
the running thread will no longer see the updated page tables.

I think I was hitting that exact problem with multiple copies
not getting updated. The section_update code was being called
and I was seeing the tables get updated but nothing was being
applied when I tried to write to text or check the debugfs
page table. The current flow is:

rest_init -> kernel_thread(kernel_init) and from that thread
mark_rodata_ro. So mark_rodata_ro is always going to happen
in a thread.

Do we need to update for both init_mm and the first running

The "first running thread" is merely coincidental for things like kexec.

Hmm. Actually, I think the existing code _should_ be fine. At the
point where mark_rodata_ro() is, we should still be using init_mm, so
updating the current threads page tables should actually be updating
the swapper_pg_dir.

That doesn't seem to hold true. Based on what I'm seeing, we lose
the the guarantee of init_mm after the first exec. If usermodehelper
gets called to load a module, that triggers an exec and the kernel
thread is no longer using init_mm after that. I'm testing with the
multi-v7 defconfig which uses the smsc911x driver which loads a
module during initcall. That gets called before mark_rodata_ro so
the init_mm is never updated. I verified that disabling smsc911x
makes things work as expected. I suspect the testing was never done
with a driver that tried to call usermodehelper during init time.

Ooooh. Nice catch. Yeah, my testing didn't include that case.

I got as far as narrowing it down that it happens after the usermodehelper
but I wasn't able to pinpoint where exactly the switch happened. It seems
like we need to have the page tables set up before any initcalls
happen otherwise we risk having an exec create stray processes which we
can't update.

Can we just make mark_rodata_ro() a no-op and do the RO setting
earlier when we do the NX setting?

Unfortunately no. The time we are doing the nx setting is before we've finished
with the initmem so we need the initmem to be finished and freed before we can
mark anything RO.

More importantly, the NX settings are also not getting set. Compare before:

---[ Kernel Mapping ]---
0xc0000000-0xc0300000 3M RW NX
0xc0300000-0xc1300000 16M RW x
0xc1300000-0xcc000000 173M RW NX
0xcc000000-0xcc040000 256K RW NX MEM/BUFFERABLE/WC
0xcc040000-0xcc100000 768K RW NX MEM/CACHED/WBRA
0xcc100000-0xcc280000 1536K RW NX MEM/BUFFERABLE/WC
0xcc280000-0xd0000000 62976K RW NX MEM/CACHED/WBRA
0xd0000000-0xd0200000 2M RW NX

and after

---[ Kernel Mapping ]---
0xc0000000-0xc0300000 3M RW NX
0xc0300000-0xc0c00000 9M ro x
0xc0c00000-0xc1100000 5M ro NX
0xc1100000-0xcc000000 175M RW NX
0xcc000000-0xcc040000 256K RW NX MEM/BUFFERABLE/WC
0xcc040000-0xcc100000 768K RW NX MEM/CACHED/WBRA
0xcc100000-0xcc280000 1536K RW NX MEM/BUFFERABLE/WC
0xcc280000-0xd0000000 62976K RW NX MEM/CACHED/WBRA
0xd0000000-0xd0200000 2M RW NX

with my test patch. I think setting both current->active_mm and &init_mm
is sufficient. Maybe explicitly setting swapper_pg_dir would be cleaner?

Is there a test that should be running in a CI somewhere to catch cases like
this where the permissions are not working as expected

My test patch that seems to be working:


diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 8a63b4c..6276b234 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -627,12 +627,10 @@ static struct section_perm ro_perms[] = {
* safe to be called with preemption disabled, as under stop_machine().
static inline void section_update(unsigned long addr, pmdval_t mask,
- pmdval_t prot)
+ pmdval_t prot, struct mm_struct *mm)
- struct mm_struct *mm;
pmd_t *pmd;
- mm = current->active_mm;
pmd = pmd_offset(pud_offset(pgd_offset(mm, addr), addr), addr);
@@ -656,7 +654,7 @@ static inline bool arch_has_strict_perms(void)
return !!(get_cr() & CR_XP);
-#define set_section_perms(perms, field) { \
+#define set_section_perms(perms, field, all) { \
size_t i; \
unsigned long addr; \
@@ -674,31 +672,35 @@ static inline bool arch_has_strict_perms(void)
for (addr = perms[i].start; \
addr < perms[i].end; \
- addr += SECTION_SIZE) \
+ addr += SECTION_SIZE) { \
section_update(addr, perms[i].mask, \
- perms[i].field); \
+ perms[i].field, current->active_mm); \
+ if (all) \
+ section_update(addr, perms[i].mask, \
+ perms[i].field, &init_mm); \
+ } \
} \
-static inline void fix_kernmem_perms(void)
+void fix_kernmem_perms(void)
- set_section_perms(nx_perms, prot);
+ set_section_perms(nx_perms, prot, true);
void mark_rodata_ro(void)
- set_section_perms(ro_perms, prot);
+ set_section_perms(ro_perms, prot, true);
void set_kernel_text_rw(void)
- set_section_perms(ro_perms, clear);
+ set_section_perms(ro_perms, clear, false);
void set_kernel_text_ro(void)
- set_section_perms(ro_perms, prot);
+ set_section_perms(ro_perms, prot, false);

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at