Re: [RFC PATCH 2/4] mm: kpromoted: Hot page info collection and promotion daemon
From: Gregory Price
Date: Mon Mar 17 2025 - 11:05:43 EST
On Mon, Mar 17, 2025 at 09:09:18AM +0530, Bharata B Rao wrote:
> On 13-Mar-25 10:14 PM, Davidlohr Bueso wrote:
> > On Thu, 06 Mar 2025, Bharata B Rao wrote:
> >
> > > +static int page_should_be_promoted(struct page_hotness_info *phi)
> > > +{
> > > + struct page *page = pfn_to_online_page(phi->pfn);
> > > + unsigned long now = jiffies;
> > > + struct folio *folio;
> > > +
> > > + if (!page || is_zone_device_page(page))
> > > + return false;
> > > +
> > > + folio = page_folio(page);
> > > + if (!folio_test_lru(folio)) {
> > > + count_vm_event(KPROMOTED_MIG_NON_LRU);
> > > + return false;
> > > + }
> > > + if (folio_nid(folio) == phi->hot_node) {
> > > + count_vm_event(KPROMOTED_MIG_RIGHT_NODE);
> > > + return false;
> > > + }
> >
> > How about using the LRU age itself:
>
> Sounds like a good check for page hotness.
>
> >
> > if (folio_test_active())
> > return true;
>
> But the numbers I obtained with this check added, didn't really hit this
> condition all that much. I was running a multi-threaded application that
> allocates enough memory such that the allocation spills over from DRAM node
> to the CXL node. Threads keep touching the memory pages in random order.
>
Is demotion enabled by any chance?
i.e. are you sure it's actually allocating from CXL and not demoting
cold stuff to CXL?
> kpromoted_recorded_accesses 960620 /* Number of recorded accesses */
> kpromoted_recorded_hwhints 960620 /* Nr accesses via HW hints, IBS in this
> case */
> kpromoted_recorded_pgtscans 0
> kpromoted_record_toptier 638006 /* Nr toptier accesses */
> kpromoted_record_added 321234 /* Nr (CXL) accesses that are tracked */
> kpromoted_record_exists 1380
> kpromoted_mig_right_node 0
> kpromoted_mig_non_lru 226
> kpromoted_mig_lru_active 47 /* Number of accesses considered for promotion
> as determined by folio_test_active() check */
> kpromoted_mig_cold_old 0
> kpromoted_mig_cold_not_accessed 1373
> kpromoted_mig_candidate 319635
> kpromoted_mig_promoted 319635
> kpromoted_mig_dropped 1599
>
> Need to check why is this the case.
>
> >
> > > +
> > > + /* If the page was hot a while ago, don't promote */
> > > + if ((now - phi->last_update) > 2 *
> > > msecs_to_jiffies(KPROMOTED_FREQ_WINDOW)) {
> > > + count_vm_event(KPROMOTED_MIG_COLD_OLD);
> > > + return false;
> > > + }
> > > +
> > > + /* If the page hasn't been accessed enough number of times,
> > > don't promote */
> > > + if (phi->frequency < KPRMOTED_FREQ_THRESHOLD) {
> > > + count_vm_event(KPROMOTED_MIG_COLD_NOT_ACCESSED);
> > > + return false;
> > > + }
> > > + return true;
> > > +}
> >
> > ...
> >
> > > +static int kpromoted(void *p)
> > > +{
> > > + pg_data_t *pgdat = (pg_data_t *)p;
> > > + struct task_struct *tsk = current;
> > > + long timeout = msecs_to_jiffies(KPROMOTE_DELAY);
> > > +
> > > + const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
> > > +
> > > + if (!cpumask_empty(cpumask))
> > > + set_cpus_allowed_ptr(tsk, cpumask);
> >
> > Explicit cpumasks are not needed if you use kthread_create_on_node().
>
> Thanks, will incorporate.
>
> Regards,
> Bharata.