Re: [RFC PATCH v4 0/6] Promotion of Unmapped Page Cache Folios.

From: IBM
Date: Sat Apr 12 2025 - 09:09:27 EST


Gregory Price <gourry@xxxxxxxxxx> writes:

> On Sat, Apr 12, 2025 at 01:35:56AM +0100, Matthew Wilcox wrote:
>> On Fri, Apr 11, 2025 at 08:09:55PM -0400, Gregory Price wrote:
>> > On Sat, Apr 12, 2025 at 12:49:18AM +0100, Matthew Wilcox wrote:
>> > > On Fri, Apr 11, 2025 at 06:11:05PM -0400, Gregory Price wrote:
>> > > > Unmapped page cache pages can be demoted to low-tier memory, but
>> > >
>> > > No. Page cache should never be demoted to low-tier memory.
>> > > NACK this patchset.

Hi Matthew,

Could you please give some context around why shouldn't page cache be
considered as a demotion target if demotion is enabled? Shouldn't
demoting page cache pages to a lower tier (when we have enough space in
lower tier) can be a better alternative then discarding these pages and
later doing I/Os to read them back again?

>> >
>> > This wasn't a statement of approval page cache being on lower tiers,
>> > it's a statement of fact. Enabling demotion causes this issue.
>>
>> Then that's the bug that needs to be fixed. Not adding 200+ lines
>> of code to recover from a situation that should never happen.

/me goes and checks when the demotion feature was added...

Ok, so I believe this was added here [1]
"[PATCH -V10 4/9] mm/migrate: demote pages during reclaim".
[1]: https://lore.kernel.org/all/20210715055145.195411-5-ying.huang@xxxxxxxxx/T/#u

I think systems with persistent memory acting as DRAM nodes, could choose
to demote page cache pages too, to lower tier instead of simply
discarding them and later doing I/O to read them back from disk.

e.g. when one has a smaller size DRAM as faster tier and larger size
PMEM as slower tier. During memory pressure on faster tier, demoting
page cache pages to slower tier can be helpful to avoid doing I/O later
to read them back in, isn't it?

>
> Well, I have a use case that make valuable use of putting the page cache
> on a farther node rather than pushing it out to disk. But this
> discussion aside, I think we could simply make this a separate mode of
> demotion_enabled
>
> /* Only demote anonymous memory */
> echo 2 > /sys/kernel/mm/numa/demotion_enabled
>

If we are going down this road... then should we consider what other
choices users may need for their usecases? e.g.

0: Demotion disabled
1: Demotion enabled for both anon and file pages
Till here the support is already present.

2: Demotion enabled only for anon pages
3: Demotion enabled only for file pages

Should this be further classified for dirty v/s clean page cache
pages too?

> Assuming we can recognize anon from just struct folio

I am not 100% sure of this, so others should correct. Should this
simply be, folio_is_file_lru() to differentiate page cache pages?

Although this still might give us anon pages which have the
PG_swapbacked dropped as a result of MADV_FREE. Note sure if that need
any special care though?


-ritesh