Re: [REGRESSION] NULL pointer dereference on ARM (AT91SAM9G25) during compaction

From: David Hildenbrand
Date: Tue Feb 11 2025 - 04:38:08 EST


On 11.02.25 10:29, Qi Zheng wrote:


On 2025/2/11 17:14, David Hildenbrand wrote:
On 11.02.25 04:45, Qi Zheng wrote:
Hi Russell,

On 2025/2/11 01:03, Russell King (Oracle) wrote:
On Mon, Feb 10, 2025 at 05:49:38PM +0100, Ezra Buehler wrote:
When running vanilla Linux 6.13 or newer (6.14-rc2) on the
AT91SAM9G25-based GARDENA smart Gateway, we are seeing a NULL pointer
dereference resulting in a kernel panic. The culprit seems to be commit
fc9c45b71f43 ("arm: adjust_pte() usepte_offset_map_rw_nolock()").
Reverting the commit apparently fixes the issue.

The blamed commit is buggy:

arch/arm/include/asm/tlbflush.h:
#define update_mmu_cache(vma, addr, ptep) \
          update_mmu_cache_range(NULL, vma, addr, ptep, 1)

So vmf can be NULL. This didn't used to matter before this commit,
because vmf was not used by ARM's update_mmu_cache_range(). However,
the commit introduced a dereference of it, which now causes a NULL
point dereference.

Not sure what the correct solution is, but at a guess, both:

    if (ptl != vmf->ptl)

need to become:

    if (!vmf || ptl != vmf->ptl)

No, we can't do that, because without using split PTE locks, we would
use shared mm->page_table_lock, which would create a deadlock.

Maybe we can simply special-case on CONFIG_SPLIT_PTE_PTLOCKS ?

if (IS_ENABLED(CONFIG_SPLIT_PTE_PTLOCKS)) {

In this case, if two vmas map the same PTE page, then the same PTE lock
will be held repeatedly. Right?

Hmm, the comment says:

/*
* This is called while another page table is mapped, so we
* must use the nested version. This also means we need to
* open-code the spin-locking.
*/

"another page table" implies that it cannot be the same. But maybe that comment was also wrong?


--
Cheers,

David / dhildenb