syzbot reports oops in lockdep's __lock_acquire(), called from
__pte_offset_map_lock() called from filemap_map_pages(); or when I run
the repro, the oops comes in pmd_install(), called from filemap_map_pmd()
called from filemap_map_pages(), just before the __pte_offset_map_lock().
The problem is that filemap_map_pmd() has been assuming that when it
finds pmd_none(), a page table has already been prepared in prealloc_pte;
and indeed do_fault_around() has been careful to preallocate one there,
when it finds pmd_none(): but what if *pmd became none in between?
My 6.6 mods in mm/khugepaged.c, avoiding mmap_lock for write, have made
it easy for *pmd to be cleared while servicing a page fault; but even
before those, a huge *pmd might be zapped while a fault is serviced.
The difference in symptomatic stack traces comes from the "memory model"
in use: pmd_install() uses pmd_populate() uses page_to_pfn(): in some
models that is strict, and will oops on the NULL prealloc_pte; in other
models, it will construct a bogus value to be populated into *pmd, then
__pte_offset_map_lock() oops when trying to access split ptlock pointer
(or some other symptom in normal case of ptlock embedded not pointer).
Reported-and-tested-by: syzbot+89edd67979b52675ddec@xxxxxxxxxxxxxxxxxxxxxxxxx
Closes: https://lore.kernel.org/linux-mm/0000000000005e44550608a0806c@xxxxxxxxxx/
Link: https://lore.kernel.org/linux-mm/20231115065506.19780-1-jose.pekkarinen@xxxxxxxxxxx/
Fixes: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths")