Re: [PATCH v1 2/2] mm: mTHP stats for pagecache folio allocations
From: Ryan Roberts
Date: Mon Jul 22 2024 - 03:37:02 EST
On 22/07/2024 04:52, Baolin Wang wrote:
>
>
> On 2024/7/14 17:05, Ryan Roberts wrote:
>> On 13/07/2024 13:54, Baolin Wang wrote:
>>>
>>>
>>> On 2024/7/13 19:00, Ryan Roberts wrote:
>>>> [...]
>>>>
>>>>>> +static int thpsize_create(int order, struct kobject *parent)
>>>>>> {
>>>>>> unsigned long size = (PAGE_SIZE << order) / SZ_1K;
>>>>>> + struct thpsize_child *stats;
>>>>>> struct thpsize *thpsize;
>>>>>> int ret;
>>>>>> + /*
>>>>>> + * Each child object (currently only "stats" directory) holds a
>>>>>> + * reference to the top-level thpsize object, so we can drop our ref to
>>>>>> + * the top-level once stats is setup. Then we just need to drop a
>>>>>> + * reference on any children to clean everything up. We can't just use
>>>>>> + * the attr group name for the stats subdirectory because there may be
>>>>>> + * multiple attribute groups to populate inside stats and overlaying
>>>>>> + * using the name property isn't supported in that way; each attr group
>>>>>> + * name, if provided, must be unique in the parent directory.
>>>>>> + */
>>>>>> +
>>>>>> thpsize = kzalloc(sizeof(*thpsize), GFP_KERNEL);
>>>>>> - if (!thpsize)
>>>>>> - return ERR_PTR(-ENOMEM);
>>>>>> + if (!thpsize) {
>>>>>> + ret = -ENOMEM;
>>>>>> + goto err;
>>>>>> + }
>>>>>> + thpsize->order = order;
>>>>>> ret = kobject_init_and_add(&thpsize->kobj, &thpsize_ktype, parent,
>>>>>> "hugepages-%lukB", size);
>>>>>> if (ret) {
>>>>>> kfree(thpsize);
>>>>>> - return ERR_PTR(ret);
>>>>>> + goto err;
>>>>>> }
>>>>>> - ret = sysfs_create_group(&thpsize->kobj, &thpsize_attr_group);
>>>>>> - if (ret) {
>>>>>> + stats = kzalloc(sizeof(*stats), GFP_KERNEL);
>>>>>> + if (!stats) {
>>>>>> kobject_put(&thpsize->kobj);
>>>>>> - return ERR_PTR(ret);
>>>>>> + ret = -ENOMEM;
>>>>>> + goto err;
>>>>>> }
>>>>>> - ret = sysfs_create_group(&thpsize->kobj, &stats_attr_group);
>>>>>> + ret = kobject_init_and_add(&stats->kobj, &thpsize_child_ktype,
>>>>>> + &thpsize->kobj, "stats");
>>>>>> + kobject_put(&thpsize->kobj);
>>>>>> if (ret) {
>>>>>> - kobject_put(&thpsize->kobj);
>>>>>> - return ERR_PTR(ret);
>>>>>> + kfree(stats);
>>>>>> + goto err;
>>>>>> }
>>>>>> - thpsize->order = order;
>>>>>> - return thpsize;
>>>>>> + if (BIT(order) & THP_ORDERS_ALL_ANON) {
>>>>>> + ret = sysfs_create_group(&thpsize->kobj, &thpsize_attr_group);
>>>>>> + if (ret)
>>>>>> + goto err_put;
>>>>>> +
>>>>>> + ret = sysfs_create_group(&stats->kobj, &stats_attr_group);
>>>>>> + if (ret)
>>>>>> + goto err_put;
>>>>>> + }
>>>>>> +
>>>>>> + if (BIT(order) & PAGECACHE_LARGE_ORDERS) {
>>>>>> + ret = sysfs_create_group(&stats->kobj, &file_stats_attr_group);
>>>>>> + if (ret)
>>>>>> + goto err_put;
>>>>>> + }
>>>>>> +
>>>>>> + list_add(&stats->node, &thpsize_child_list);
>>>>>> + return 0;
>>>>>> +err_put:
>>>>>
>>>>> IIUC, I think you should call 'sysfs_remove_group' to remove the group before
>>>>> putting the kobject.
>>>>
>>>> Are you sure about that? As I understood it, sysfs_create_group() was
>>>> conceptually modifying the state of the kobj, so when the kobj gets destroyed,
>>>> all its state is tidied up. __kobject_del() (called on the last kobject_put())
>>>> calls sysfs_remove_groups() and tidies up the sysfs state as far as I can see?
>>>
>>> IIUC, __kobject_del() only removes the ktype defaut groups by
>>> 'sysfs_remove_groups(kobj, ktype->default_groups)', but your created groups are
>>> not added into the ktype->default_groups. That means you should mannuly remove
>>> them, or am I miss something?
>>
>> That was also putting doubt in my mind. But the sample at
>> samples/kobject/kobject-example.c does not call sysfs_remove_group(). It just
>> calls sysfs_create_group() in example_init() and calls kobject_put() in
>> example_exit(). So I think that's the correct pattern.
>>
>> Looking at the code more closely, sysfs_create_group() just creates files for
>> each of the attributes in the group. __kobject_del() calls sysfs_remove_dir(),
>> who's comment states "we remove any files in the directory before we remove the
>> directory" so I'm pretty sure sysfs_remove_group() is not required.
>
> Thanks for the explanation, and I think you are right after checking the code
> again. Sorry for the noise.
No problem, thanks for raising anyway; TBH, I wasn't completely sure when I
wrote it initially. So good to have clear resolution.