Re: [RFC v2 PATCH 0/5] Promotion of Unmapped Page Cache Folios.
From: Gregory Price
Date: Fri Dec 27 2024 - 14:10:01 EST
On Fri, Dec 27, 2024 at 10:40:36AM -0500, Gregory Price wrote:
> > Can we measure the largest improvement? For example, run the benchmark
> > with all file pages in DRAM and CXL.mem via numa binding, and compare.
>
> I can probably come up with something, will rework some stuff.
>
so I did as you suggested, I made a program that allocates a 16GB
buffer, initializes it, them membinds itself to node1 before accessing
the file to force it into pagecache, then i ran a bunch of tests.
Completely unexpected result: ~25% overhead from an inexplicable source.
baseline - no membind()
./test
Read loop took 0.93 seconds
drop caches
./test - w/ membind(1) just before file open
Read loop took 1.16 seconds
node 1 size: 262144 MB
node 1 free: 245756 MB <- file confirmed in cache
kill and relaunch without membind to avoid any funny business
./test
Read loop took 1.16 seconds
enable promotion
Read loop took 3.37 seconds <- migration overhead
... snip ...
Read loop took 1.17 seconds <- stabilizes here
node 1 size: 262144 MB
node 1 free: 262144 MB <- pagecache promoted
Absolutely bizarre result: there is 0% CXL usage ocurring, but the
overhead we originally measured is still present.
This overhead persists even if i do the following
- disable pagecache promotion
- disable numa_balancing
- offline CXL memory entirely
This is actually pretty wild. I presume this must imply the folio flags
are mucked up after migration and we're incurring a bunch of overhead
on access for no reason. At the very least it doesn't appear to be
an isolated folio issue:
nr_isolated_anon 0
nr_isolated_file 0
I'll have to dig into this further, I wonder if this happens with mapped
memory as well.
~Gregory