Re: [PATCHv3] mm: fix incorrect vbq reference in purge_fragmented_block

From: Baoquan He
Date: Fri May 31 2024 - 22:34:38 EST


On 05/31/24 at 10:04am, Uladzislau Rezki wrote:
> On Fri, May 31, 2024 at 11:05:20AM +0800, zhaoyang.huang wrote:
> > From: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
> >
> > vmalloc area runs out in our ARM64 system during an erofs test as
> > vm_map_ram failed[1]. By following the debug log, we find that
> > vm_map_ram()->vb_alloc() will allocate new vb->va which corresponding
> > to 4MB vmalloc area as list_for_each_entry_rcu returns immediately
> > when vbq->free->next points to vbq->free. That is to say, 65536 times
> > of page fault after the list's broken will run out of the whole
> > vmalloc area. This should be introduced by one vbq->free->next point to
> > vbq->free which makes list_for_each_entry_rcu can not iterate the list
> > and find the BUG.
> >
> > [1]
> > PID: 1 TASK: ffffff80802b4e00 CPU: 6 COMMAND: "init"
> > #0 [ffffffc08006afe0] __switch_to at ffffffc08111d5cc
> > #1 [ffffffc08006b040] __schedule at ffffffc08111dde0
> > #2 [ffffffc08006b0a0] schedule at ffffffc08111e294
> > #3 [ffffffc08006b0d0] schedule_preempt_disabled at ffffffc08111e3f0
> > #4 [ffffffc08006b140] __mutex_lock at ffffffc08112068c
> > #5 [ffffffc08006b180] __mutex_lock_slowpath at ffffffc08111f8f8
> > #6 [ffffffc08006b1a0] mutex_lock at ffffffc08111f834
> > #7 [ffffffc08006b1d0] reclaim_and_purge_vmap_areas at ffffffc0803ebc3c
> > #8 [ffffffc08006b290] alloc_vmap_area at ffffffc0803e83fc
> > #9 [ffffffc08006b300] vm_map_ram at ffffffc0803e78c0
> >
> > Fixes: fc1e0d980037 ("mm/vmalloc: prevent stale TLBs in fully utilized blocks")
> >
> > Suggested-by: Hailong.Liu <hailong.liu@xxxxxxxx>
> > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
> >
> Is a problem related to run out of vmalloc space _only_ or it is a problem
> with broken list? From the commit message it is hard to follow the reason.

This should fix the broken list.

Hi Zhaoyang and Hailong,

Could any of you test below patch in your testing environment?