Re: [PATCH v8 1/6] list_lru: allows explicit memcg and NUMA node selection

From: Chris Li
Date: Mon Dec 04 2023 - 19:31:01 EST


On Thu, Nov 30, 2023 at 12:35 PM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
>
> On Thu, Nov 30, 2023 at 12:07:41PM -0800, Nhat Pham wrote:
> > On Thu, Nov 30, 2023 at 11:57 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > >
> > > On Thu, Nov 30, 2023 at 11:40:18AM -0800, Nhat Pham wrote:
> > > > This patch changes list_lru interface so that the caller must explicitly
> > > > specify numa node and memcg when adding and removing objects. The old
> > > > list_lru_add() and list_lru_del() are renamed to list_lru_add_obj() and
> > > > list_lru_del_obj(), respectively.
> > >
> > > Wouldn't it be better to add list_lru_add_memcg() and
> > > list_lru_del_memcg() and have:

That is my first thought as well. If we are having two different
flavors of LRU add, one has memcg and one without. The list_lru_add()
vs list_lru_add_memcg() is the common way to do it.
> > >
> > > +bool list_lru_del(struct list_lru *lru, struct list_head *item)
> > > +{
> > > + int nid = page_to_nid(virt_to_page(item));
> > > + struct mem_cgroup *memcg = list_lru_memcg_aware(lru) ?
> > > + mem_cgroup_from_slab_obj(item) : NULL;
> > > +
> > > + return list_lru_del_memcg(lru, item, nid, memcg);
> > > +}
> > >
> > > Seems like _most_ callers will want the original versions and only
> > > a few will want the explicit memcg/nid versions. No?
> > >
> >
> > I actually did something along that line in earlier iterations of this
> > patch series (albeit with poorer naming - __list_lru_add() instead of
> > list_lru_add_memcg()). The consensus after some back and forth was
> > that the original list_lru_add() was not a very good design (the
> > better one was this new version that allows for explicit numa/memcg
> > selection). So I agreed to fix it everywhere as a prep patch.
> >
> > I don't have strong opinions here to be completely honest, but I do
> > think this new API makes more sense (at the cost of quite a bit of
> > elbow grease to fix every callsites and extra reviewing).
>
> Maybe I can shed some light since I was pushing for doing it this way.
>
> The quiet assumption that 'struct list_head *item' is (embedded in) a
> slab object that is also charged to a cgroup is a bit much, given that
> nothing in the name or documentation of the function points to that.

We can add it to the document if that is desirable.

>
> It bit us in the THP shrinker where that list head is embedded in a
> tailpage (virt_to_page(page) is fun to debug). And it caused some
> confusion in this case as well, where the zswap entry is a slab object
> but not charged (the entry descriptor is not attractive for cgroup
> accounting, only the backing memory it points to.)
>
> Yes, for most users - at least right now - the current assumption is
> accurate. The thinking was just that if we do have to differentiate
> callers now anyway, we might as well make the interface a bit more
> self-documenting and harder to misuse going forward, even if it's a
> bit more churn now.

It comes down to whether we need to have the non memcg version of API
going forward. If we don't then change the meaning of list_lru_add()
to perform the deed of list_lru_add_memcg() makes sense. My assumption
is that the non memcg version of the API does have legit usage. In
that case, it seems having the original list_lru_add() and
list_lur_add_memcg() as Mattew suggested feels more natural. What you
really want is that every caller of list_lru_add() should seriously
consider switching to list_lru_add_memcg() unless it has a very good
reason to stay in the non memcg version. Renaming and changing the
meaning of list_lru_add() is a bit confusing and has a negative impact
on the outstanding patch that uses list_lru_add(). The end of the
day, some developers still need to evaluate the call site one by one,
renaming the function is not going to help that effort. Just make it
more obvious.

Just my 2 cents, others please chime in. Just to make it clear, that
is my preference, it is not a NACK.

Chris