On Sun, 23 Mar 2025 23:44:02 +0530 Raghavendra K T wrote
On 3/21/2025 4:23 PM, Hillf Danton wrote:Given page cache, direct IO and coherent DMA have their roles to play.
On Wed, 19 Mar 2025 19:30:24 +0000 Raghavendra K T wrote
One of the key challenges in PTE A bit based scanning is to find rightMy $.02 for selecting promotion target node given a simple multi tier system.
target node to promote to.
Here is a simple heuristic based approach:
While scanning pages of any mm we also scan toptier pages that belong
to that mm. We get an insight on the distribution of pages that potentially
belonging to particular toptier node and also its recent access.
Current logic walks all the toptier node, and picks the one with highest
accesses.
Tk /* top Tierk (k > 0) has K (K > 0) nodes */
...
Tj /* Tierj (j > 0) has J (J > 0) nodes */
...
T0 /* bottom Tier0 has O (O > 0) nodes */
Unless config comes from user space (sysfs window for example should be opened),
1, adopt the data flow pattern of L3 cache <--> DRAM <--> SSD, to only
select Tj+1 when promoting pages in Tj.
Hello Hillf ,
Thanks for giving a thought on this. This looks to be good idea in
general. Mostly be able to implement with reverse of preferred demotion
target?
Thinking loud, Can there be exception cases similar to non-temporal copy
operations, where we don't want to pollute cache?
I mean cases we don't want to hop via middle tier node..?
Trying to cure all pains with ONE pill wastes minutes I think.2, select the node in Tj+1 that has the most free pages for promotion
by default.
Not sure if this is productive always.
To achive reliable high order pages, page allocator can not work well in
combination with kswapd and kcompactd without clear boundaries drawn in
between the tree parties for example.
for e.g.Yes and no (say, a couple seconds later mm pressure rises in node0).
node 0-1 toptier (100GB)
node2 slowtier
suppose a workload (that occupies 80GB in total) running on CPU of node1
where 40GB is already in node1 rest of 40GB is in node2.
Now it is preferred to consolidate workload on node1 when slowtier
data becomes hot?
In case of yes, I would like to turn on autonuma in the toptier instead
without bothering to select the target node. You see a line is drawn
between autonma and slowtier promotion now.