Re: [LSF/MM/BPF TOPIC] Unifying sources of page temperature information - what info is actually wanted?
From: Bharata B Rao
Date: Wed Feb 05 2025 - 01:26:47 EST
On 31-Jan-25 6:39 PM, Jonathan Cameron wrote:
On Fri, 31 Jan 2025 12:28:03 +0000
Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote:
Here is the list of potential discussion points:
...
2. Possibility of maintaining single source of truth for page hotness that would
maintain hot page information from multiple sources and let other sub-systems
use that info.
Hi,
I was thinking of proposing a separate topic on a single source of hotness,
but this question covers it so I'll add some thoughts here instead.
I think we are very early, but sharing some experience and thoughts in a
session may be useful.
Thinking more on this over lunch, I think it is worth calling this out as a
potential session topic in it's own right rather than trying to find
time within other sessions. Hence the title change.
I think a session would start with a brief listing of the temperature sources
we have and those on the horizon to motivate what we are unifying, then
discussion to focus on need for such a unification + requirements
(maybe with a straw man).
Here is a compilation of available temperature sources and how the
hot/access data is consumed by different subsystems:
PA-Physical address available
VA-Virtual address available
AA-Access time available
NA-accessing Node info available
I have left the slot blank for those which I am not sure about.
==================================================
Temperature PA VA AA NA
source
==================================================
PROT_NONE faults Y Y Y Y
--------------------------------------------------
folio_mark_accessed() Y Y Y
--------------------------------------------------
PTE A bit Y Y N N
--------------------------------------------------
Platform hints Y Y Y Y
(AMD IBS)
--------------------------------------------------
Device hints Y
(CXL HMU)
==================================================
And here is an attempt to compile how different subsystems
use the above data:
==============================================================
Source Subsystem Consumption
==============================================================
PROT_NONE faults NUMAB NUMAB=1 locality based
via process pgtable balancing
walk NUMAB=2 hot page
promotion
==============================================================
folio_mark_accessed() FS/filemap/GUP LRU list activation
==============================================================
PTE A bit via Reclaim:LRU LRU list activation,
rmap walk deactivation/demotion
==============================================================
PTE A bit via Reclaim:MGLRU LRU list activation,
rmap walk and process deactivation/demotion
pgtable walk
==============================================================
PTE A bit via DAMON LRU activation,
rmap walk hot page promotion,
demotion etc
==============================================================
Platform hints NUMAB NUMAB=1 Locality based
(AMD IBS) balancing and
NUMAB=2 hot page
promotion
==============================================================
Device hints NUMAB NUMAB=2 hot page
promotion
==============================================================
The last two are listed as possibilities.
Feel free to correct/clarify and add more.
Regards,
Bharata.