Re: [PATCH] arm64/mm: fold check for KFENCE into can_set_direct_map()
From: Barry Song
Date: Wed Sep 21 2022 - 17:45:47 EST
On Thu, Sep 22, 2022 at 3:16 AM Mike Rapoport <rppt@xxxxxxxxxx> wrote:
>
> Hi Barry,
>
> On Wed, Sep 21, 2022 at 09:00:28PM +1200, Barry Song wrote:
> > On Wed, Sep 21, 2022 at 8:26 PM Mike Rapoport <rppt@xxxxxxxxxx> wrote:
> > >
> > > From: Mike Rapoport <rppt@xxxxxxxxxxxxx>
> > >
> > > KFENCE requires linear map to be mapped at page granularity, so that it
> > > is possible to protect/unprotect single pages, just like with
> > > rodata_full and DEBUG_PAGEALLOC.
> > >
> > > Instead of repating
> > >
> > > can_set_direct_map() || IS_ENABLED(CONFIG_KFENCE)
> > >
> > > make can_set_direct_map() handle the KFENCE case.
> > >
> > > This also prevents potential false positives in kernel_page_present()
> > > that may return true for non-present page if CONFIG_KFENCE is enabled.
> > >
> > > Signed-off-by: Mike Rapoport <rppt@xxxxxxxxxxxxx>
> > > ---
> > > arch/arm64/mm/mmu.c | 8 ++------
> > > arch/arm64/mm/pageattr.c | 8 +++++++-
> > > 2 files changed, 9 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > > index e7ad44585f40..c5065abec55a 100644
> > > --- a/arch/arm64/mm/mmu.c
> > > +++ b/arch/arm64/mm/mmu.c
> > > @@ -535,7 +535,7 @@ static void __init map_mem(pgd_t *pgdp)
> > > */
> > > BUILD_BUG_ON(pgd_index(direct_map_end - 1) == pgd_index(direct_map_end));
> > >
> > > - if (can_set_direct_map() || IS_ENABLED(CONFIG_KFENCE))
> > > + if (can_set_direct_map())
> > > flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> > >
> > > /*
> > > @@ -1547,11 +1547,7 @@ int arch_add_memory(int nid, u64 start, u64 size,
> > >
> > > VM_BUG_ON(!mhp_range_allowed(start, size, true));
> > >
> > > - /*
> > > - * KFENCE requires linear map to be mapped at page granularity, so that
> > > - * it is possible to protect/unprotect single pages in the KFENCE pool.
> > > - */
> > > - if (can_set_direct_map() || IS_ENABLED(CONFIG_KFENCE))
> > > + if (can_set_direct_map())
> > > flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> > >
> > > __create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
> > > diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
> > > index 64e985eaa52d..d107c3d434e2 100644
> > > --- a/arch/arm64/mm/pageattr.c
> > > +++ b/arch/arm64/mm/pageattr.c
> > > @@ -21,7 +21,13 @@ bool rodata_full __ro_after_init = IS_ENABLED(CONFIG_RODATA_FULL_DEFAULT_ENABLED
> > >
> > > bool can_set_direct_map(void)
> > > {
> > > - return rodata_full || debug_pagealloc_enabled();
> > > + /*
> > > + * rodata_full, DEBUG_PAGEALLOC and KFENCE require linear map to be
> > > + * mapped at page granularity, so that it is possible to
> > > + * protect/unprotect single pages.
> > > + */
> > > + return rodata_full || debug_pagealloc_enabled() ||
> > > + IS_ENABLED(CONFIG_KFENCE);
> >
> > might be irrelevant, i wonder if rodata_full is too strict as
> > rodata_full is almost
> > always true since RODATA_FULL_DEFAULT_ENABLED is default true.
>
> Not sure I follow. If either of these conditions is true the linear map
> consists of base pages and it's possible to change attributes of each base
> page. Whenever linear map contains block mapping, page attributes cannot be
> modified.
Hi Mike,
My question is irrelevant with your patch. It is more of another
topic. i understand
we need to protect read-only data of kernel, but it seems overly
defensive. We are
getting the whole linear mapping PTE-mapped.
/sys/kernel/debug# cat kernel_page_tables
0x0000000000000000-0xffff608000000000 17179705856G PGD
0xffff608000000000-0xffff60bf40000000 253G PUD
0xffff60bf40000000-0xffff60bf40200000 2M PTE RW NX SHD
AF NG UXN MEM/NORMAL-TAGGED
0xffff60bf40200000-0xffff60bf41800000 22M PMD ro NX SHD
AF NG BLK UXN MEM/NORMAL
0xffff60bf41800000-0xffff60bf41890000 576K PTE ro NX SHD
AF NG UXN MEM/NORMAL
0xffff60bf41890000-0xffff60c04022e000 4171384K PTE RW NX SHD
AF NG UXN MEM/NORMAL-TAGGED
0xffff60c04022e000-0xffff60c04022f000 4K PTE ro NX SHD
AF NG UXN MEM/NORMAL-TAGGED
0xffff60c04022f000-0xffff60c042a3c000 41012K PTE RW NX SHD
AF NG UXN MEM/NORMAL-TAGGED
0xffff60c042a3c000-0xffff60c042a3d000 4K PTE ro NX SHD
AF NG UXN MEM/NORMAL-TAGGED
0xffff60c042a3d000-0xffff60c043431000 10192K PTE RW NX SHD
AF NG UXN MEM/NORMAL-TAGGED
0xffff60c043431000-0xffff60c043432000 4K PTE ro NX SHD
AF NG UXN MEM/NORMAL-TAGGED
0xffff60c043432000-0xffff60c0448e8000 21208K PTE RW NX SHD
AF NG UXN MEM/NORMAL-TAGGED
0xffff60c0448e8000-0xffff60c0448e9000 4K PTE ro NX SHD
AF NG UXN MEM/NORMAL-TAGGED
0xffff60c0448e9000-0xffff60c0448ed000 16K PTE RW NX SHD
AF NG UXN MEM/NORMAL-TAGGED
0xffff60c0448ed000-0xffff60c0448ee000 4K PTE ro NX SHD
AF NG UXN MEM/NORMAL-TAGGED
0xffff60c0448ee000-0xffff60c044a6b000 1524K PTE RW NX SHD
AF NG UXN MEM/NORMAL-TAGGED
0xffff60c044a6b000-0xffff60c044a6c000 4K PTE ro NX SHD
AF NG UXN MEM/NORMAL-TAGGED
0xffff60c044a6c000-0xffff60c044a74000 32K PTE RW NX SHD
AF NG UXN MEM/NORMAL-TAGGED
0xffff60c044a74000-0xffff60c044a75000 4K PTE ro NX SHD
AF NG UXN MEM/NORMAL-TAGGED
0xffff60c044a75000-0xffff60c044aaa000 212K PTE RW NX SHD
AF NG UXN MEM/NORMAL-TAGGED
0xffff60c044aaa000-0xffff60c044aab000 4K PTE ro NX SHD
AF NG UXN MEM/NORMAL-TAGGED
0xffff60c044aab000-0xffff60c053000000 234836K PTE RW NX SHD
AF NG UXN MEM/NORMAL-TAGGED
0xffff60c053000000-0xffff60c080000000 720M PMD
0xffff60c080000000-0xffff610000000000 254G PUD
0xffff610000000000-0xffff800000000000 31T PGD
---[ Linear Mapping end ]---
For example, for the below,
0xffff60c04022e000-0xffff60c04022f000 4K PTE ro NX SHD
AF NG UXN MEM/NORMAL-TAGGED
0xffff60c04022f000-0xffff60c042a3c000 41012K PTE RW NX SHD
AF NG UXN MEM/NORMAL-TAGGED
41012K PTE is really big, don't we have a chance to make it
block/cont-pte mapped by
some alignment tricks such as,
0xffff60c04022e000-0xffff60c04022f000 4K PTE ro NX SHD
AF NG UXN MEM/NORMAL-TAGGED
0xffff60c04022e000-0xffff60c040230000 4K PTE RW NX SHD
AF NG UXN MEM/NORMAL-TAGGED
0xffff60c040230000-0xffff60c040400000- 4K PTE CONT..
0xffff60c040400000- 2MB PMD.....
>
> And rodata_full might be false because
> CONFIG_RODATA_FULL_DEFAULT_ENABLED was disabled at build time.
>
> > > }
> > >
> > > static int change_page_range(pte_t *ptep, unsigned long addr, void *data)
> > > --
> > > 2.35.3
> > >
> >
> > Thanks
> > Barry
>
> --
> Sincerely yours,
> Mike.
Thanks
Barry