Re: [PATCH v1 2/2] mm: mTHP stats for pagecache folio allocations
From: Ryan Roberts
Date: Sun Jul 14 2024 - 05:06:13 EST
On 13/07/2024 13:54, Baolin Wang wrote:
>
>
> On 2024/7/13 19:00, Ryan Roberts wrote:
>> [...]
>>
>>>> +static int thpsize_create(int order, struct kobject *parent)
>>>> {
>>>> unsigned long size = (PAGE_SIZE << order) / SZ_1K;
>>>> + struct thpsize_child *stats;
>>>> struct thpsize *thpsize;
>>>> int ret;
>>>> + /*
>>>> + * Each child object (currently only "stats" directory) holds a
>>>> + * reference to the top-level thpsize object, so we can drop our ref to
>>>> + * the top-level once stats is setup. Then we just need to drop a
>>>> + * reference on any children to clean everything up. We can't just use
>>>> + * the attr group name for the stats subdirectory because there may be
>>>> + * multiple attribute groups to populate inside stats and overlaying
>>>> + * using the name property isn't supported in that way; each attr group
>>>> + * name, if provided, must be unique in the parent directory.
>>>> + */
>>>> +
>>>> thpsize = kzalloc(sizeof(*thpsize), GFP_KERNEL);
>>>> - if (!thpsize)
>>>> - return ERR_PTR(-ENOMEM);
>>>> + if (!thpsize) {
>>>> + ret = -ENOMEM;
>>>> + goto err;
>>>> + }
>>>> + thpsize->order = order;
>>>> ret = kobject_init_and_add(&thpsize->kobj, &thpsize_ktype, parent,
>>>> "hugepages-%lukB", size);
>>>> if (ret) {
>>>> kfree(thpsize);
>>>> - return ERR_PTR(ret);
>>>> + goto err;
>>>> }
>>>> - ret = sysfs_create_group(&thpsize->kobj, &thpsize_attr_group);
>>>> - if (ret) {
>>>> + stats = kzalloc(sizeof(*stats), GFP_KERNEL);
>>>> + if (!stats) {
>>>> kobject_put(&thpsize->kobj);
>>>> - return ERR_PTR(ret);
>>>> + ret = -ENOMEM;
>>>> + goto err;
>>>> }
>>>> - ret = sysfs_create_group(&thpsize->kobj, &stats_attr_group);
>>>> + ret = kobject_init_and_add(&stats->kobj, &thpsize_child_ktype,
>>>> + &thpsize->kobj, "stats");
>>>> + kobject_put(&thpsize->kobj);
>>>> if (ret) {
>>>> - kobject_put(&thpsize->kobj);
>>>> - return ERR_PTR(ret);
>>>> + kfree(stats);
>>>> + goto err;
>>>> }
>>>> - thpsize->order = order;
>>>> - return thpsize;
>>>> + if (BIT(order) & THP_ORDERS_ALL_ANON) {
>>>> + ret = sysfs_create_group(&thpsize->kobj, &thpsize_attr_group);
>>>> + if (ret)
>>>> + goto err_put;
>>>> +
>>>> + ret = sysfs_create_group(&stats->kobj, &stats_attr_group);
>>>> + if (ret)
>>>> + goto err_put;
>>>> + }
>>>> +
>>>> + if (BIT(order) & PAGECACHE_LARGE_ORDERS) {
>>>> + ret = sysfs_create_group(&stats->kobj, &file_stats_attr_group);
>>>> + if (ret)
>>>> + goto err_put;
>>>> + }
>>>> +
>>>> + list_add(&stats->node, &thpsize_child_list);
>>>> + return 0;
>>>> +err_put:
>>>
>>> IIUC, I think you should call 'sysfs_remove_group' to remove the group before
>>> putting the kobject.
>>
>> Are you sure about that? As I understood it, sysfs_create_group() was
>> conceptually modifying the state of the kobj, so when the kobj gets destroyed,
>> all its state is tidied up. __kobject_del() (called on the last kobject_put())
>> calls sysfs_remove_groups() and tidies up the sysfs state as far as I can see?
>
> IIUC, __kobject_del() only removes the ktype defaut groups by
> 'sysfs_remove_groups(kobj, ktype->default_groups)', but your created groups are
> not added into the ktype->default_groups. That means you should mannuly remove
> them, or am I miss something?
That was also putting doubt in my mind. But the sample at
samples/kobject/kobject-example.c does not call sysfs_remove_group(). It just
calls sysfs_create_group() in example_init() and calls kobject_put() in
example_exit(). So I think that's the correct pattern.
Looking at the code more closely, sysfs_create_group() just creates files for
each of the attributes in the group. __kobject_del() calls sysfs_remove_dir(),
who's comment states "we remove any files in the directory before we remove the
directory" so I'm pretty sure sysfs_remove_group() is not required.
By the way, if we do choose to only populate stats if that size can be used by
anon/shmem/file, then I've found sysfs_merge_group() which will simplify adding
named groups without needing to manually create the stats directory as I am in
this version of the patch. I'll migrate to using that approach in v2. Of course
if we decide to take the approach of populating all stats for all sizes, that
problem goes away anyway.
Thanks,
Ryan