Re: [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU

From: Zi Yan

Date: Wed Jun 10 2026 - 16:42:30 EST


On 10 Jun 2026, at 16:30, Andrew Morton wrote:

> On Wed, 10 Jun 2026 20:05:35 +0800 "zhaoyang.huang" <zhaoyang.huang@xxxxxxxxxx> wrote:
>
>> From: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
>>
>> The kernel panics are keeping to be reported especially when the f2fs
>> partition get almost full. By investigation, we find that the reason is
>> one f2fs page got freed to buddy without being deleted from LRU and the
>> root cause is the race happened in [2] which is enrolled by this commit.
>> We solve this issue by reverting a f2fs commit 9609dd704725 ("f2fs: remove
>> non-uptodate folio from the page cache in move_data_block").
>>
>> There are 3 race processes in this scenario, please find below for their
>> main activities. However, by further investigation over the code, I
>> think there is a common race window for the truncated folios between
>> split_folio_to_order and folio_isolate_lru, where the folios lost the
>> refcount on page cache and remains the transient one of the split
>> caller, under which the folio could enter free path and compete with the
>> isolation process. This commit would like to suggest to have the folios
>> beyond EOF stay out of LRU.
>>
>> Truncate:
>> The changed code in move_data_block() lets the GC path evict the tail-end
>> folio from the page cache through folio_end_dropbehind(). Once
>> folio_unmap_invalidate() removes the folio from mapping->i_pages, the
>> page-cache references for all pages in the folio are dropped. The folio
>> is then kept alive only by temporary external references, which allows a
>> later split to operate on a folio whose subpages are no longer protected
>> by page-cache references.
>>
>> Split:
>> After the page-cache references are gone, split_folio_to_order() can
>> split the big folio into individual pages and put the resulting subpages
>> back on the LRU. For tail pages beyond EOF, split removes them from the
>> page cache and drops their page-cache references. A tail page can then
>> remain on the LRU with PG_lru set while holding only the split caller's
>> temporary reference. When free_folio_and_swap_cache() drops that final
>> reference, the page enters the final folio_put() release path.
>>
>> Isolate:
>> In parallel, folio_isolate_lru() can observe the same tail page with a
>> non-zero refcount and PG_lru set. It clears PG_lru before taking its own
>> reference. If this races with the final folio_put() from the split path,
>> __folio_put() sees PG_lru already cleared and skips lruvec_del_folio().
>> The page is then freed back to the allocator while its lru links are
>> still present in the LRU list. A later LRU operation on a neighboring
>> page detects the stale link and reports list corruption.
>
> Thanks. Sashiko AI review might have found some problems with folio
> flags:
>
> https://sashiko.dev/#/patchset/20260610120535.2370844-1-zhaoyang.huang@xxxxxxxxxx

Claude also raised the same concern when I was reasoning about this issue.

At least for now, my conclusion is that the race between folio_split()
and folio_isolate_lru() should not cause the issue and something else
is wrong.

Best Regards,
Yan, Zi