Re: [PATCH v7 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
From: Rakie Kim
Date: Thu Apr 10 2025 - 03:54:50 EST
On Wed, 9 Apr 2025 11:51:36 -0700 Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> Rakie Kim wrote:
> > On Tue, 8 Apr 2025 20:54:48 -0700 Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> > > Dan Williams wrote:
> > > > Rakie Kim wrote:
> > > > > Previously, the weighted interleave sysfs structure was statically
> > > > > managed during initialization. This prevented new nodes from being
> > > > > recognized when memory hotplug events occurred, limiting the ability
> > > > > to update or extend sysfs entries dynamically at runtime.
> > > > >
> > > > > To address this, this patch refactors the sysfs infrastructure and
> > > > > encapsulates it within a new structure, `sysfs_wi_group`, which holds
> > > > > both the kobject and an array of node attribute pointers.
> > > > >
> > > > > By allocating this group structure globally, the per-node sysfs
> > > > > attributes can be managed beyond initialization time, enabling
> > > > > external modules to insert or remove node entries in response to
> > > > > events such as memory hotplug or node online/offline transitions.
> > > > >
> > > > > Instead of allocating all per-node sysfs attributes at once, the
> > > > > initialization path now uses the existing sysfs_wi_node_add() and
> > > > > sysfs_wi_node_delete() helpers. This refactoring makes it possible
> > > > > to modularly manage per-node sysfs entries and ensures the
> > > > > infrastructure is ready for runtime extension.
> > > > >
> > > > > Signed-off-by: Rakie Kim <rakie.kim@xxxxxx>
> > > > > Signed-off-by: Honggyu Kim <honggyu.kim@xxxxxx>
> > > > > Signed-off-by: Yunjeong Mun <yunjeong.mun@xxxxxx>
> > > > > Reviewed-by: Gregory Price <gourry@xxxxxxxxxx>
> > > > > ---
> > > > > mm/mempolicy.c | 61 ++++++++++++++++++++++++--------------------------
> > > > > 1 file changed, 29 insertions(+), 32 deletions(-)
> > > > >
> > > > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > > > > index 0da102aa1cfc..988575f29c53 100644
> > > > > --- a/mm/mempolicy.c
> > > > > +++ b/mm/mempolicy.c
> > > > > @@ -3419,6 +3419,13 @@ struct iw_node_attr {
> > > > > int nid;
> > > > > };
> > > > >
> > > > > +struct sysfs_wi_group {
> > > > > + struct kobject wi_kobj;
> > > > > + struct iw_node_attr *nattrs[];
> > > > > +};
> > > > > +
> > > > > +static struct sysfs_wi_group *wi_group;
> > > > > +
> > > > > static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
> > > > > char *buf)
> > > > > {
> > > > > @@ -3461,27 +3468,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
> > > > > return count;
> > > > > }
> > > > >
> > > > > -static struct iw_node_attr **node_attrs;
> > > > > -
> > > > > -static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> > > > > - struct kobject *parent)
> > > > > +static void sysfs_wi_node_delete(int nid)
> > > > > {
> > > > > - if (!node_attr)
> > > > > + if (!wi_group->nattrs[nid])
> > > > > return;
> > > > > - sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
> > > > > - kfree(node_attr->kobj_attr.attr.name);
> > > > > - kfree(node_attr);
> > > > > +
> > > > > + sysfs_remove_file(&wi_group->wi_kobj,
> > > > > + &wi_group->nattrs[nid]->kobj_attr.attr);
> > > >
> > > > This still looks broken to me, but I think this is more a problem that
> > > > was present in the original code.
> > > >
> > > > At this point @wi_group's reference count is zero because
> > > > sysfs_wi_release() has been called. However, it can only be zero if it has
> > > > properly transitioned through kobject_del() and final kobject_put(). It
> > > > follows that kobject_del() arranges for kobj->sd to be NULL. That means
> > > > that this *should* be hitting the WARN() in kernfs_remove_by_name_ns()
> > > > for the !parent case.
> > > >
> > > > So, either you are not triggering that path, or testing that path, but
> > > > sys_remove_file() of the child attributes should be happening *before*
> > > > sysfs_wi_release().
> > > >
> > > > Did I miss something?
> > >
> > > I think the missing change is that sysfs_wi_node_add() failures need to
> > > be done with a sysfs_wi_node_delete() of the added attrs *before* the
> > > kobject_del() of @wi_group.
> >
> > Hi Dan Williams
> >
> > Thank you very much for identifying this potential issue in the code.
> >
> > As you pointed out, this seems to be a problem that was already present in
> > the original implementation, and I agree that it needs to be addressed.
> >
> > However, since this issue existed prior to the changes in this patch
> > series, I believe it would be more appropriate to fix it in a separate
> > follow-up patch rather than include it here.
>
> I tend to disagree. The whole motivation of this series is to get the
> kobject lifetime handling correct in order to add the new dynamic
> capability. The claimed correctness fixups are incomplete. There is time
> to respin this (we are only at -rc1) and get it right before landing the
> new dynamic capability.
>
> One of the outcomes of the "MM Process" topic at LSF/MM was that Andrew
> wanted more feedback on when patches are not quite ready for prime-time
> and I think this is an example of a patch set that deserves another spin
> to meet the stated goals.
>
> > I will start preparing a new patch to address this problem, and I would
> > greatly appreciate it if you could review it once it's ready.
>
> Will definitely review it. I will leave to Andrew if he wants an
> incremental fixup on top of this series, or rebase on top of a fully
> fixed baseline. My preference is finish fixing all the old kobject()
> issues and then rebase the new dynamic work on top. Either way, do not
> be afraid to ask Andrew to replace a series in -mm, that's a sign of the
> process working as expected.
Thank you very much for your advice, and I completely agree with your
recommendation. I will immediately ask Andrew to remove this patch series
from -mm. Then, I will prepare a new version, v8, which properly addresses
the kobject-related issues you pointed out.
Once again, I sincerely appreciate your thoughtful and detailed feedback.
Rakie