Re: [LSF/MM/BPF TOPIC] Unifying sources of page temperature information - what info is actually wanted?
From: Johannes Weiner
Date: Wed Feb 05 2025 - 11:06:36 EST
On Wed, Feb 05, 2025 at 11:54:05AM +0530, Bharata B Rao wrote:
> On 31-Jan-25 6:39 PM, Jonathan Cameron wrote:
> > On Fri, 31 Jan 2025 12:28:03 +0000
> > Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote:
> >
> >>> Here is the list of potential discussion points:
> >> ...
> >>
> >>> 2. Possibility of maintaining single source of truth for page hotness that would
> >>> maintain hot page information from multiple sources and let other sub-systems
> >>> use that info.
> >> Hi,
> >>
> >> I was thinking of proposing a separate topic on a single source of hotness,
> >> but this question covers it so I'll add some thoughts here instead.
> >> I think we are very early, but sharing some experience and thoughts in a
> >> session may be useful.
> >
> > Thinking more on this over lunch, I think it is worth calling this out as a
> > potential session topic in it's own right rather than trying to find
> > time within other sessions. Hence the title change.
> >
> > I think a session would start with a brief listing of the temperature sources
> > we have and those on the horizon to motivate what we are unifying, then
> > discussion to focus on need for such a unification + requirements
> > (maybe with a straw man).
>
> Here is a compilation of available temperature sources and how the
> hot/access data is consumed by different subsystems:
This is super useful, thanks for collecting this.
> PA-Physical address available
> VA-Virtual address available
> AA-Access time available
> NA-accessing Node info available
>
> I have left the slot blank for those which I am not sure about.
> ==================================================
> Temperature PA VA AA NA
> source
> ==================================================
> PROT_NONE faults Y Y Y Y
> --------------------------------------------------
> folio_mark_accessed() Y Y Y
> --------------------------------------------------
For fma(), the VA info is available in unmap, but usually it isn't -
or doesn't meaningfully exist, as in the case of unmapped buffered IO.
I'd say it's an N.
> PTE A bit Y Y N N
> --------------------------------------------------
> Platform hints Y Y Y Y
> (AMD IBS)
> --------------------------------------------------
> Device hints Y
> (CXL HMU)
> ==================================================
For the following table, it might be useful to add *when* the source
produces this information. Sampling frequency is a likely challenge:
consumers have different requirements, and overhead should be limited
to the minimum required to serve enabled consumers.
Here is an (incomplete) attempt - sorry about the long lines:
> And here is an attempt to compile how different subsystems
> use the above data:
> ==============================================================
> Source Subsystem Consumption Activation/Frequency
> ==============================================================
> PROT_NONE faults NUMAB NUMAB=1 locality based While task is running,
> via process pgtable balancing rate varies on observed
> walk NUMAB=2 hot page locality and sysctl knobs.
> promotion
> ==============================================================
> folio_mark_accessed() FS/filemap/GUP LRU list activation On cache access and unmap
> ==============================================================
> PTE A bit via Reclaim:LRU LRU list activation, During memory pressure
> rmap walk deactivation/demotion
> ==============================================================
> PTE A bit via Reclaim:MGLRU LRU list activation, - During memory pressure
> rmap walk and process deactivation/demotion - Continuous sampling (configurable)
> pgtable walk for workingset reporting
> ==============================================================
> PTE A bit via DAMON LRU activation, Continuous sampling (configurable)?
> rmap walk hot page promotion, (I believe SJ is looking into
> demotion etc auto-tuning this).
> ==============================================================
> Platform hints NUMAB NUMAB=1 Locality based
> (AMD IBS) balancing and
> NUMAB=2 hot page
> promotion
> ==============================================================
> Device hints NUMAB NUMAB=2 hot page
> promotion
> ==============================================================
> The last two are listed as possibilities.
>
> Feel free to correct/clarify and add more.
>
> Regards,
> Bharata.