Re: [PATCH v7 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug

From: Dan Williams
Date: Wed Apr 09 2025 - 14:52:31 EST


Rakie Kim wrote:
> On Tue, 8 Apr 2025 20:54:48 -0700 Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> > Dan Williams wrote:
> > > Rakie Kim wrote:
> > > > Previously, the weighted interleave sysfs structure was statically
> > > > managed during initialization. This prevented new nodes from being
> > > > recognized when memory hotplug events occurred, limiting the ability
> > > > to update or extend sysfs entries dynamically at runtime.
> > > >
> > > > To address this, this patch refactors the sysfs infrastructure and
> > > > encapsulates it within a new structure, `sysfs_wi_group`, which holds
> > > > both the kobject and an array of node attribute pointers.
> > > >
> > > > By allocating this group structure globally, the per-node sysfs
> > > > attributes can be managed beyond initialization time, enabling
> > > > external modules to insert or remove node entries in response to
> > > > events such as memory hotplug or node online/offline transitions.
> > > >
> > > > Instead of allocating all per-node sysfs attributes at once, the
> > > > initialization path now uses the existing sysfs_wi_node_add() and
> > > > sysfs_wi_node_delete() helpers. This refactoring makes it possible
> > > > to modularly manage per-node sysfs entries and ensures the
> > > > infrastructure is ready for runtime extension.
> > > >
> > > > Signed-off-by: Rakie Kim <rakie.kim@xxxxxx>
> > > > Signed-off-by: Honggyu Kim <honggyu.kim@xxxxxx>
> > > > Signed-off-by: Yunjeong Mun <yunjeong.mun@xxxxxx>
> > > > Reviewed-by: Gregory Price <gourry@xxxxxxxxxx>
> > > > ---
> > > > mm/mempolicy.c | 61 ++++++++++++++++++++++++--------------------------
> > > > 1 file changed, 29 insertions(+), 32 deletions(-)
> > > >
> > > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > > > index 0da102aa1cfc..988575f29c53 100644
> > > > --- a/mm/mempolicy.c
> > > > +++ b/mm/mempolicy.c
> > > > @@ -3419,6 +3419,13 @@ struct iw_node_attr {
> > > > int nid;
> > > > };
> > > >
> > > > +struct sysfs_wi_group {
> > > > + struct kobject wi_kobj;
> > > > + struct iw_node_attr *nattrs[];
> > > > +};
> > > > +
> > > > +static struct sysfs_wi_group *wi_group;
> > > > +
> > > > static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
> > > > char *buf)
> > > > {
> > > > @@ -3461,27 +3468,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
> > > > return count;
> > > > }
> > > >
> > > > -static struct iw_node_attr **node_attrs;
> > > > -
> > > > -static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> > > > - struct kobject *parent)
> > > > +static void sysfs_wi_node_delete(int nid)
> > > > {
> > > > - if (!node_attr)
> > > > + if (!wi_group->nattrs[nid])
> > > > return;
> > > > - sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
> > > > - kfree(node_attr->kobj_attr.attr.name);
> > > > - kfree(node_attr);
> > > > +
> > > > + sysfs_remove_file(&wi_group->wi_kobj,
> > > > + &wi_group->nattrs[nid]->kobj_attr.attr);
> > >
> > > This still looks broken to me, but I think this is more a problem that
> > > was present in the original code.
> > >
> > > At this point @wi_group's reference count is zero because
> > > sysfs_wi_release() has been called. However, it can only be zero if it has
> > > properly transitioned through kobject_del() and final kobject_put(). It
> > > follows that kobject_del() arranges for kobj->sd to be NULL. That means
> > > that this *should* be hitting the WARN() in kernfs_remove_by_name_ns()
> > > for the !parent case.
> > >
> > > So, either you are not triggering that path, or testing that path, but
> > > sys_remove_file() of the child attributes should be happening *before*
> > > sysfs_wi_release().
> > >
> > > Did I miss something?
> >
> > I think the missing change is that sysfs_wi_node_add() failures need to
> > be done with a sysfs_wi_node_delete() of the added attrs *before* the
> > kobject_del() of @wi_group.
>
> Hi Dan Williams
>
> Thank you very much for identifying this potential issue in the code.
>
> As you pointed out, this seems to be a problem that was already present in
> the original implementation, and I agree that it needs to be addressed.
>
> However, since this issue existed prior to the changes in this patch
> series, I believe it would be more appropriate to fix it in a separate
> follow-up patch rather than include it here.

I tend to disagree. The whole motivation of this series is to get the
kobject lifetime handling correct in order to add the new dynamic
capability. The claimed correctness fixups are incomplete. There is time
to respin this (we are only at -rc1) and get it right before landing the
new dynamic capability.

One of the outcomes of the "MM Process" topic at LSF/MM was that Andrew
wanted more feedback on when patches are not quite ready for prime-time
and I think this is an example of a patch set that deserves another spin
to meet the stated goals.

> I will start preparing a new patch to address this problem, and I would
> greatly appreciate it if you could review it once it's ready.

Will definitely review it. I will leave to Andrew if he wants an
incremental fixup on top of this series, or rebase on top of a fully
fixed baseline. My preference is finish fixing all the old kobject()
issues and then rebase the new dynamic work on top. Either way, do not
be afraid to ask Andrew to replace a series in -mm, that's a sign of the
process working as expected.