Re: [PATCH v15 00/23] Generic page walk and ptdump

From: Steven Price
Date: Wed Dec 04 2019 - 11:32:45 EST


On Wed, Dec 04, 2019 at 02:56:58PM +0000, David Hildenbrand wrote:
> On 04.12.19 15:54, Qian Cai wrote:
> >
> >
> >> On Dec 3, 2019, at 6:02 AM, David Hildenbrand <david@xxxxxxxxxx> wrote:
> >>
> >> On 06.11.19 16:05, Steven Price wrote:
> >>> On 06/11/2019 13:31, Qian Cai wrote:
> >>>>
> >>>>
> >>>>> On Nov 4, 2019, at 2:35 PM, Qian Cai <cai@xxxxxx> wrote:
> >>>>>
> >>>>> On Fri, 2019-11-01 at 14:09 +0000, Steven Price wrote:
> >>> [...]
> >>>>>> Changes since v14:
> >>>>>> https://lore.kernel.org/lkml/20191028135910.33253-1-steven.price@xxxxxxx/
> >>>>>> * Switch walk_page_range() into two functions, the existing
> >>>>>> walk_page_range() now still requires VMAs (and treats areas without a
> >>>>>> VMA as a 'hole'). The new walk_page_range_novma() ignores VMAs and
> >>>>>> will report the actual page table layout. This fixes the previous
> >>>>>> breakage of /proc/<pid>/pagemap
> >>>>>> * New patch at the end of the series which reduces the 'level' numbers
> >>>>>> by 1 to simplify the code slightly
> >>>>>> * Added tags
> >>>>>
> >>>>> Does this new version also take care of this boot crash seen with v14? Suppose
> >>>>> it is now breaking CONFIG_EFI_PGT_DUMP=y? The full config is,
> >>>>>
> >>>>> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config
> >>>>>
> >>>>
> >>>> V15 is indeed DOA here.
> >>>
> >>> Thanks for finding this, it looks like EFI causes issues here. The below fixes
> >>> this for me (booting in QEMU).
> >>>
> >>> Andrew: do you want me to send out the entire series again for this fix, or
> >>> can you squash this into mm-pagewalk-allow-walking-without-vma.patch?
> >>>
> >>> Thanks,
> >>>
> >>> Steve
> >>>
> >>> ---8<---
> >>> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> >>> index c7529dc4f82b..70dcaa23598f 100644
> >>> --- a/mm/pagewalk.c
> >>> +++ b/mm/pagewalk.c
> >>> @@ -90,7 +90,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
> >>> split_huge_pmd(walk->vma, pmd, addr);
> >>> if (pmd_trans_unstable(pmd))
> >>> goto again;
> >>> - } else if (pmd_leaf(*pmd)) {
> >>> + } else if (pmd_leaf(*pmd) || !pmd_present(*pmd)) {
> >>> continue;
> >>> }
> >>>
> >>> @@ -141,7 +141,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
> >>> split_huge_pud(walk->vma, pud, addr);
> >>> if (pud_none(*pud))
> >>> goto again;
> >>> - } else if (pud_leaf(*pud)) {
> >>> + } else if (pud_leaf(*pud) || !pud_present(*pud)) {
> >>> continue;
> >>> }
> >>>
> >>>
> >>
> >> Even with this fix, booting for me under QEMU fails. See
> >>
> >> https://lore.kernel.org/linux-mm/b7ce62f2-9a48-6e48-6685-003431e521aa@xxxxxxxxxx/
> >>
> >
> > Yes, for some reasons, this starts to crash on almost all arches here, so it might be worth
> > for Andrew to revert those in the meantime while allowing Steven to rework.
>
> I agree, this produces too much noise.

I've bisected this problem and it's a merge conflict with:

ace88f1018b8 ("mm: pagewalk: Take the pagetable lock in walk_pte_range()")

Reverting that commit "fixes" the problem. That commit adds a call to
pte_offset_map_lock(), however that isn't necessarily safe when
considering an "unusual" mapping in the kernel. Combined with my patch
set this leads to the BUG when walking the kernel's page tables.

At this stage I think it's best if Andrew drops my series and I'll try
to rework it on top -rc1 fixing up this conflict and the other x86
32-bit issue that has cropped up.

Steve