Re: [PATCH] mm: fix a race scenario in folio_isolate_lru

From: Zhaoyang Huang
Date: Sun Mar 17 2024 - 00:08:01 EST


On Sat, Mar 16, 2024 at 10:59 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> On Sat, Mar 16, 2024 at 04:53:09PM +0800, Zhaoyang Huang wrote:
> > On Fri, Mar 15, 2024 at 8:46 PM Matthew Wilcox <willy@infradeadorg> wrote:
> > >
> > > On Thu, Mar 14, 2024 at 04:39:21PM +0800, zhaoyang.huang wrote:
> > > > From: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
> > > >
> > > > Panic[1] reported which is caused by lruvec->list break. Fix the race
> > > > between folio_isolate_lru and release_pages.
> > > >
> > > > race condition:
> > > > release_pages could meet a non-refered folio which escaped from being
> > > > deleted from LRU but add to another list_head
> > >
> > > I don't think the bug is in folio_isolate_lru() but rather in its
> > > caller.
> > >
> > > * Context:
> > > *
> > > * (1) Must be called with an elevated refcount on the folio. This is a
> > > * fundamental difference from isolate_lru_folios() (which is called
> > > * without a stable reference).
> > >
> > > So when release_pages() runs, it must not see a refcount decremented to
> > > zero, because the caller of folio_isolate_lru() is supposed to hold one.
> > >
> > > Your stack trace is for the thread which is calling release_pages(), not
> > > the one calling folio_isolate_lru(), so I can't help you debug further.
> > Thanks for the comments. According to my understanding,
> > folio_put_testzero does the decrement before test which makes it
> > possible to have release_pages see refcnt equal zero and proceed
> > further(folio_get in folio_isolate_lru has not run yet).
>
> No, that's not possible.
>
> In the scenario below, at entry to folio_isolate_lru(), the folio has
> refcount 2. It has one refcount from thread 0 (because it must own one
> before calling folio_isolate_lru()) and it has one refcount from thread 1
> (because it's about to call release_pages()). If release_pages() were
> not running, the folio would have refcount 3 when folio_isolate_lru()
> returned.
Could it be this scenario, where folio comes from pte(thread 0), local
fbatch(thread 1) and page cache(thread 2) concurrently and proceed
intermixed without lock's protection? Actually, IMO, thread 1 also
could see the folio with refcnt==1 since it doesn't care if the page
is on the page cache or not.

madivise_cold_and_pageout does no explicit folio_get thing since the
folio comes from pte which implies it has one refcnt from pagecache

#thread 0(madivise_cold_and_pageout) #1
(lru_add_drain->fbatch_release_pages)
#2(read_pages->filemap_remove_folios)
refcnt == 1(represent page cache)

refcnt==2(another one represent LRU)
folio comes from page cache
folio_isolate_lru
release_pages
filemap_free_folio


refcnt==1(decrease the one of page cache)

folio_put_testzero == true

<No lruvec_del_folio>

list_add(folio->lru, pages_to_free) //current folio will break LRU's
integrity since it has not been deleted

In case of gmail's wrap, split above chart to two parts

#thread 0(madivise_cold_and_pageout) #1
(lru_add_drain->fbatch_release_pages)
refcnt == 1(represent page cache)

refcnt==2(another one represent LRU)
folio_isolate_lru release_pages

folio_put_testzero == true

<No lruvec_del_folio>

list_add(folio->lru, pages_to_free)

//current folio will break LRU's integrity since it has not been
deleted

#1 (lru_add_drain->fbatch_release_pages)
#2(read_pages->filemap_remove_folios)
refcnt==2(another one represent LRU)
folio comes from page cache
release_pages
filemap_free_folio

refcnt==1(decrease the one of page cache)
folio_put_testzero == true
<No lruvec_del_folio>
list_add(folio->lru, pages_to_free)
//current folio will break LRU's integrity since it has not been deleted
>
> > #0 folio_isolate_lru #1 release_pages
> > BUG_ON(!folio_refcnt)
> > if (folio_put_testzero())
> > folio_get(folio)
> > if (folio_test_clear_lru())