Re: [PATCH] x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y

From: Andrey Ryabinin
Date: Mon Jul 24 2017 - 11:55:48 EST




On 07/24/2017 06:37 PM, Kirill A. Shutemov wrote:
> On Mon, Jul 24, 2017 at 06:25:58PM +0300, Andrey Ryabinin wrote:
>> KASAN fills kernel page tables with repeated values to map several
>> TBs of the virtual memory to the single kasan_zero_page:
>> kasan_zero_p4d ->
>> kasan_zero_pud ->
>> kasan_zero_pmd->
>> kasan_zero_pte->
>> kasan_zero_page
>>
>> Walking the whole KASAN shadow range takes a lot of time, especially
>> with 5-level page tables. Since we already know that all kasan page tables
>> eventually point to the kasan_zero_page we could call note_page()
>> right and avoid walking lower levels of the page tables.
>> This will not affect the output of the kernel_page_tables file,
>> but let us avoid spending time in page table walkers:
>>
>> Before:
>> time cat /sys/kernel/debug/kernel_page_tables > /dev/null
>>
>> real 0m55.855s
>> user 0m0.000s
>> sys 0m55.840s
>>
>> After:
>> time cat /sys/kernel/debug/kernel_page_tables > /dev/null
>>
>> real 0m0.054s
>> user 0m0.000s
>> sys 0m0.054s
>>
>> Signed-off-by: Andrey Ryabinin <aryabinin@xxxxxxxxxxxxx>
>> ---
>> arch/x86/mm/dump_pagetables.c | 64 +++++++++++++++++++++++++++----------------
>> 1 file changed, 41 insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
>> index b371ab68f2d4..5e3ac6fe6c9e 100644
>> --- a/arch/x86/mm/dump_pagetables.c
>> +++ b/arch/x86/mm/dump_pagetables.c
>> @@ -13,12 +13,12 @@
>> */
>>
>> #include <linux/debugfs.h>
>> +#include <linux/kasan.h>
>> #include <linux/mm.h>
>> #include <linux/init.h>
>> #include <linux/sched.h>
>> #include <linux/seq_file.h>
>>
>> -#include <asm/kasan.h>
>> #include <asm/pgtable.h>
>>
>> /*
>> @@ -302,23 +302,53 @@ static void walk_pte_level(struct seq_file *m, struct pg_state *st, pmd_t addr,
>> start++;
>> }
>> }
>> +#ifdef CONFIG_KASAN
>> +
>> +/*
>> + * This is an optimization for KASAN=y case. Since all kasan page tables
>> + * eventually point to the kasan_zero_page we could call note_page()
>> + * right away without walking through lower level page tables. This saves
>> + * us dozens of seconds (minutes for 5-level config) while checking for
>> + * W+X mapping or reading kernel_page_tables debugfs file.
>> + */
>> +static inline bool kasan_page_table(struct seq_file *m, struct pg_state *st,
>> + void *pt)
>> +{
>> + if (__pa(pt) == __pa(kasan_zero_pmd) ||
>> +#ifdef CONFIG_X86_5LEVEL
>> + __pa(pt) == __pa(kasan_zero_p4d) ||
>> +#endif
>> + __pa(pt) == __pa(kasan_zero_pud)) {
>> + pgprotval_t prot = pte_flags(kasan_zero_pte[0]);
>> + note_page(m, st, __pgprot(prot), 5);
>
> Hm. I don't think '5' is correct here. Shouldn't it be dependent on what
> page table we detected? Or just take it from caller?
>

No, 5 is correct, because all kasan page tables endup as a pte mapping of kasan_zero_page.
And pte is level 5.

Anything but 5 would give us incorrect output in the last column in the last column of the dump,
pmd,pud or p4d instead of correct pte.

Without the patch, if we stump on kasan page table we always endup with a note_page(..., 5) call.
With this patch we just avoid useless walks on lower levels of page tables, because we already know the end result,
so we just call note_page(.., 5) right away.