Re: [PATCH V4] mm/gup: Clear the LRU flag of a page before adding to LRU batch

From: David Hildenbrand
Date: Wed Mar 26 2025 - 08:46:47 EST


On 26.03.25 13:42, Jinjiang Tu wrote:
Hi,


Hi!

We notiched a 12.3% performance regression for LibMicro pwrite testcase due to
commit 33dfe9204f29 ("mm/gup: clear the LRU flag of a page before adding to LRU batch").

The testcase is executed as follows, and the file is tmpfs file.
pwrite -E -C 200 -L -S -W -N "pwrite_t1k" -s 1k -I 500 -f $TFILE

Do we know how much that reflects real workloads? (IOW, how much should we care)


this testcase writes 1KB (only one page) to the tmpfs and repeats this step for many times. The Flame
graph shows the performance regression comes from folio_mark_accessed() and workingset_activation().

folio_mark_accessed() is called for the same page for many times. Before this patch, each call will
add the page to cpu_fbatches.activate. When the fbatch is full, the fbatch is drained and the page
is promoted to active list. And then, folio_mark_accessed() does nothing in later calls.

But after this patch, the folio clear lru flags after it is added to cpu_fbatches.activate. After then,
folio_mark_accessed will never call folio_activate() again due to the page is without lru flag, and
the fbatch will not be full and the folio will not be marked active, later folio_mark_accessed()
calls will always call workingset_activation(), leading to performance regression.

Would there be a good place to drain the LRU to effectively get that processed? (we can always try draining if the LRU flag is not set)


--
Cheers,

David / dhildenb