Re: [PATCH] arm: Use kernel mm when updating section permissions
From: Kees Cook
Date: Fri Nov 06 2015 - 14:12:07 EST
On Fri, Nov 6, 2015 at 11:08 AM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> On Fri, Nov 6, 2015 at 10:44 AM, Laura Abbott <labbott@xxxxxxxxxx> wrote:
>> On 11/05/2015 05:15 PM, Kees Cook wrote:
>>>
>>> On Thu, Nov 5, 2015 at 5:05 PM, Laura Abbott <labbott@xxxxxxxxxx> wrote:
>>>>
>>>> On 11/05/2015 08:27 AM, Russell King - ARM Linux wrote:
>>>>>
>>>>>
>>>>> On Thu, Nov 05, 2015 at 08:20:42AM -0800, Laura Abbott wrote:
>>>>>>
>>>>>>
>>>>>> On 11/05/2015 01:46 AM, Russell King - ARM Linux wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Nov 04, 2015 at 05:00:39PM -0800, Laura Abbott wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Currently, read only permissions are not being applied even
>>>>>>>> when CONFIG_DEBUG_RODATA is set. This is because section_update
>>>>>>>> uses current->mm for adjusting the page tables. current->mm
>>>>>>>> need not be equivalent to the kernel version. Use pgd_offset_k
>>>>>>>> to get the proper page directory for updating.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> What are you trying to achieve here? You can't use these functions
>>>>>>> at run time (after the first thread has been spawned) to change
>>>>>>> permissions, because there will be multiple copies of the kernel
>>>>>>> section mappings, and those copies will not get updated.
>>>>>>>
>>>>>>> In any case, this change will probably break kexec and ftrace, as
>>>>>>> the running thread will no longer see the updated page tables.
>>>>>>>
>>>>>>
>>>>>> I think I was hitting that exact problem with multiple copies
>>>>>> not getting updated. The section_update code was being called
>>>>>> and I was seeing the tables get updated but nothing was being
>>>>>> applied when I tried to write to text or check the debugfs
>>>>>> page table. The current flow is:
>>>>>>
>>>>>> rest_init -> kernel_thread(kernel_init) and from that thread
>>>>>> mark_rodata_ro. So mark_rodata_ro is always going to happen
>>>>>> in a thread.
>>>>>>
>>>>>> Do we need to update for both init_mm and the first running
>>>>>> thread?
>>>>>
>>>>>
>>>>>
>>>>> The "first running thread" is merely coincidental for things like kexec.
>>>>>
>>>>> Hmm. Actually, I think the existing code _should_ be fine. At the
>>>>> point where mark_rodata_ro() is, we should still be using init_mm, so
>>>>> updating the current threads page tables should actually be updating
>>>>> the swapper_pg_dir.
>>>>
>>>>
>>>>
>>>> That doesn't seem to hold true. Based on what I'm seeing, we lose
>>>> the the guarantee of init_mm after the first exec. If usermodehelper
>>>> gets called to load a module, that triggers an exec and the kernel
>>>> thread is no longer using init_mm after that. I'm testing with the
>>>> multi-v7 defconfig which uses the smsc911x driver which loads a
>>>> module during initcall. That gets called before mark_rodata_ro so
>>>> the init_mm is never updated. I verified that disabling smsc911x
>>>> makes things work as expected. I suspect the testing was never done
>>>> with a driver that tried to call usermodehelper during init time.
>>>
>>>
>>> Ooooh. Nice catch. Yeah, my testing didn't include that case.
>>>
>>>> I got as far as narrowing it down that it happens after the
>>>> usermodehelper
>>>> but I wasn't able to pinpoint where exactly the switch happened. It seems
>>>> like we need to have the page tables set up before any initcalls
>>>> happen otherwise we risk having an exec create stray processes which we
>>>> can't update.
>>>
>>>
>>> Can we just make mark_rodata_ro() a no-op and do the RO setting
>>> earlier when we do the NX setting?
>>>
>>
>> Unfortunately no. The time we are doing the nx setting is before we've
>> finished
>> with the initmem so we need the initmem to be finished and freed before we
>> can
>> mark anything RO.
>>
>> More importantly, the NX settings are also not getting set. Compare before:
>>
>> ---[ Kernel Mapping ]---
>> 0xc0000000-0xc0300000 3M RW NX
>> 0xc0300000-0xc1300000 16M RW x
>> 0xc1300000-0xcc000000 173M RW NX
>> 0xcc000000-0xcc040000 256K RW NX MEM/BUFFERABLE/WC
>> 0xcc040000-0xcc100000 768K RW NX MEM/CACHED/WBRA
>> 0xcc100000-0xcc280000 1536K RW NX MEM/BUFFERABLE/WC
>> 0xcc280000-0xd0000000 62976K RW NX MEM/CACHED/WBRA
>> 0xd0000000-0xd0200000 2M RW NX
>>
>> and after
>>
>> ---[ Kernel Mapping ]---
>> 0xc0000000-0xc0300000 3M RW NX
>> 0xc0300000-0xc0c00000 9M ro x
>> 0xc0c00000-0xc1100000 5M ro NX
>> 0xc1100000-0xcc000000 175M RW NX
>> 0xcc000000-0xcc040000 256K RW NX MEM/BUFFERABLE/WC
>> 0xcc040000-0xcc100000 768K RW NX MEM/CACHED/WBRA
>> 0xcc100000-0xcc280000 1536K RW NX MEM/BUFFERABLE/WC
>> 0xcc280000-0xd0000000 62976K RW NX MEM/CACHED/WBRA
>> 0xd0000000-0xd0200000 2M RW NX
>>
>>
>> with my test patch. I think setting both current->active_mm and &init_mm
>> is sufficient. Maybe explicitly setting swapper_pg_dir would be cleaner?
>>
>> Is there a test that should be running in a CI somewhere to catch cases like
>> this where the permissions are not working as expected
>
> I wrote these for lkdtm -- actually just mentioned it here:
> http://lwn.net/Articles/663531/
>
> EXEC_DATA
> EXEC_STACK
> EXEC_KMALLOC
> EXEC_VMALLOC
> EXEC_USERSPACE
> ACCESS_USERSPACE
> WRITE_RO
> WRITE_KERN
>
> Each of those should Oops the kernel if things are working correctly.
> I'm not aware of a public CI that currently handles checking for
> expected Oops via lkdtm.
>
> The other tests I ran when building this were to turn ftrace on and
> off. If that works for you, then this patch seems fine. (AIUI, the
> code would be unchanged from original when running ftrace, so I would
> expect this to work.)
Hi Kevin and Kernel CI folks,
Could lkdtm get added to the kernel-CI workflows? Extracting and
validating Oops details when poking lkdtm would be extremely valuable
for these cases. :)
-Kees
>
> -Kees
>
>>
>> My test patch that seems to be working:
>>
>> ----8<-----
>>
>> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
>> index 8a63b4c..6276b234 100644
>> --- a/arch/arm/mm/init.c
>> +++ b/arch/arm/mm/init.c
>> @@ -627,12 +627,10 @@ static struct section_perm ro_perms[] = {
>> * safe to be called with preemption disabled, as under stop_machine().
>> */
>> static inline void section_update(unsigned long addr, pmdval_t mask,
>> - pmdval_t prot)
>> + pmdval_t prot, struct mm_struct *mm)
>> {
>> - struct mm_struct *mm;
>> pmd_t *pmd;
>> - mm = current->active_mm;
>> pmd = pmd_offset(pud_offset(pgd_offset(mm, addr), addr), addr);
>> #ifdef CONFIG_ARM_LPAE
>> @@ -656,7 +654,7 @@ static inline bool arch_has_strict_perms(void)
>> return !!(get_cr() & CR_XP);
>> }
>> -#define set_section_perms(perms, field) {
>> \
>> +#define set_section_perms(perms, field, all) { \
>> size_t i; \
>> unsigned long addr; \
>> \
>> @@ -674,31 +672,35 @@ static inline bool arch_has_strict_perms(void)
>> \
>> for (addr = perms[i].start; \
>> addr < perms[i].end; \
>> - addr += SECTION_SIZE) \
>> + addr += SECTION_SIZE) { \
>> section_update(addr, perms[i].mask, \
>> - perms[i].field); \
>> + perms[i].field, current->active_mm);
>> \
>> + if (all) \
>> + section_update(addr, perms[i].mask, \
>> + perms[i].field, &init_mm); \
>> + } \
>> } \
>> }
>> -static inline void fix_kernmem_perms(void)
>> +void fix_kernmem_perms(void)
>> {
>> - set_section_perms(nx_perms, prot);
>> + set_section_perms(nx_perms, prot, true);
>> }
>> #ifdef CONFIG_DEBUG_RODATA
>> void mark_rodata_ro(void)
>> {
>> - set_section_perms(ro_perms, prot);
>> + set_section_perms(ro_perms, prot, true);
>> }
>> void set_kernel_text_rw(void)
>> {
>> - set_section_perms(ro_perms, clear);
>> + set_section_perms(ro_perms, clear, false);
>> }
>> void set_kernel_text_ro(void)
>> {
>> - set_section_perms(ro_perms, prot);
>> + set_section_perms(ro_perms, prot, false);
>> }
>> #endif /* CONFIG_DEBUG_RODATA */
>>
>>
>>
>>
>>
>
>
>
> --
> Kees Cook
> Chrome OS Security
--
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/