Re: 6.6/regression/bisected - after commit a349d72fd9efc87c8fd1d16d3164752d84a7275b system stopped booting

From: Hugh Dickins
Date: Fri Sep 01 2023 - 18:48:40 EST


On Fri, 1 Sep 2023, Mikhail Gavrilov wrote:
> On Fri, Sep 1, 2023 at 2:08 PM Hugh Dickins <hughd@xxxxxxxxxx> wrote:
> >
> >
> > Sorry about that, please try this instead, adds EXPORT_SYMBOL(pte_unmap).
> >
>
> Thanks, now I have a working kernel builded at commit a349d72fd9ef.
>
> > I've never used stackdepot before, but I've tried this out in good and
> > bad cases, and expect it to work for you, shedding light on where is
> > going wrong - machine should boot up fine, and in dmesg you'll find one
> > stacktrace between "WARNING: pte_map..." and "End of pte_map..." lines.
>
> Interesting, I checked twice but I didn't find any entry with
> "pte_map" in the kernel log after applying your patch.

That was very disappointing: I found it hard to explain, but was thinking
of sending you a similar patch, doing the same check on all your 32 CPUs -
maybe the stall being on CPU 0 in your photo was accidental.

But now I think I have the shameful answer (which studying your dmesg,
and the 82328 jiffies at 86 seconds in your photo, did help me towards).

That mm/pagewalk fix I put into 6.5 has a grievous oversight (and a
video of your failing 6.6 bootup would likely have shown a WARN_ON_ONCE
from the underflow in __rcu_read_unlock()).

Please revert the debug patch I sent yesterday (or earlier today), please
try booting with this one on top of a349d72fd9ef; and if that's successful,
then please go back to your original Rawhide tree and apply this on top of
that, to confirm that boots to a working system too - thanks.

With my apologies,

[PATCH] mm/pagewalk: fix bootstopping regression from extra pte_unmap()

[ Commit message yet to be written: it's actually something to go to
6.5 stable, to correct i386 CONFIG_HIGHPTE there - though we know of
no case where it is actually hit. ]

Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
---
mm/pagewalk.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 2022333805d3..9e7d0276c38a 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -58,7 +58,7 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
pte = pte_offset_map(pmd, addr);
if (pte) {
err = walk_pte_range_inner(pte, addr, end, walk);
- if (walk->mm != &init_mm)
+ if (walk->mm != &init_mm && addr < TASK_SIZE)
pte_unmap(pte);
}
} else {
--
2.35.3