Re: [PATCH] x86/mm/ptdump: Fix soft lockup in page table walker.
From: Mark Rutland
Date: Fri Feb 10 2017 - 09:45:40 EST
On Fri, Feb 10, 2017 at 05:38:20PM +0300, Andrey Ryabinin wrote:
> On 02/10/2017 05:29 PM, Mark Rutland wrote:
> > On Fri, Feb 10, 2017 at 04:56:19PM +0300, Andrey Ryabinin wrote:
> >> On 02/10/2017 04:02 PM, Dmitry Vyukov wrote:
> >>> On Fri, Feb 10, 2017 at 1:15 PM, Andrey Ryabinin
> >>> <aryabinin@xxxxxxxxxxxxx> wrote:
> >>>> On 02/10/2017 02:18 PM, Thomas Gleixner wrote:
> >>>>> On Fri, 10 Feb 2017, Dmitry Vyukov wrote:
> >
> >> diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
> >> index 8aa6bea..1599a5c 100644
> >> --- a/arch/x86/mm/dump_pagetables.c
> >> +++ b/arch/x86/mm/dump_pagetables.c
> >> @@ -373,6 +373,11 @@ static inline bool is_hypervisor_range(int idx)
> >> #endif
> >> }
> >>
> >> +static bool pgd_already_checked(pgd_t *prev_pgd, pgd_t *pgd, bool checkwx)
> >> +{
> >> + return checkwx && prev_pgd && (pgd_val(*prev_pgd) == pgd_val(*pgd));
> >> +}
> >> +
> >> static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
> >> bool checkwx)
> >> {
> >> @@ -381,6 +386,7 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
> >> #else
> >> pgd_t *start = swapper_pg_dir;
> >> #endif
> >> + pgd_t *prev_pgd = NULL;
> >> pgprotval_t prot;
> >> int i;
> >> struct pg_state st = {};
> >> @@ -396,7 +402,8 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
> >>
> >> for (i = 0; i < PTRS_PER_PGD; i++) {
> >> st.current_address = normalize_addr(i * PGD_LEVEL_MULT);
> >> - if (!pgd_none(*start) && !is_hypervisor_range(i)) {
> >> + if (!pgd_none(*start) && !is_hypervisor_range(i) &&
> >> + !pgd_already_checked(prev_pgd, start, checkwx)) {
> >
> > This means we'll fall into the else case...
> >
> >> if (pgd_large(*start) || !pgd_present(*start)) {
> >> prot = pgd_flags(*start);
> >> note_page(m, &st, __pgprot(prot), 1);
> >> @@ -408,6 +415,7 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
> >> note_page(m, &st, __pgprot(0), 1);
> >
> > ... i.e. the note_page() here, where we'll claim that the's nothing
> > present due to the empty prot.
> >
> > That'll give erroneous output for the userspace pagetable dumps, so I do
> > not think this is quite right, even though it gives a boot-time speedup.
> >
>
> For userspace pagetable dumps checkwx is false, so
> page_already_checked() will return false and will not go into else.
> userspace pagetable dumps works as before.
Ah. I missed that; sorry for the noise.
That sounds ok then, though it's probably worth a comment as to what
we're doing this for.
Thanks,
Mark.