Re: [PATCH v3 3/3] x86/resctrl: Implement rename op for mon groups

From: Reinette Chatre
Date: Thu Mar 02 2023 - 17:26:59 EST


Hi Peter,

On 3/2/2023 6:26 AM, Peter Newman wrote:
> On Sat, Feb 11, 2023 at 12:21 AM Reinette Chatre
> <reinette.chatre@xxxxxxxxx> wrote:
>
>> On 1/25/2023 2:13 AM, Peter Newman wrote:
>
>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> @@ -3238,6 +3238,80 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
>>> return ret;
>>> }
>>>
>>> +static void mongrp_move(struct rdtgroup *rdtgrp, struct rdtgroup *new_prdtgrp,
>>> + cpumask_var_t cpus)
>>
>> Could you please add some function comments on what is moved and how it is accomplished?
>
> Sure, I think I should also make the name more descriptive. I think
> "move" is too high-level here.
>
>
>>> + for_each_process_thread(p, t) {
>>> + if (is_closid_match(t, prdtgrp) && is_rmid_match(t, rdtgrp))
>>> + rdt_move_one_task(t, new_prdtgrp->closid, t->rmid,
>>> + cpus);
>>> + }
>>> + read_unlock(&tasklist_lock);
>>
>> Can rdt_move_group_tasks() be used here?
>
> As it is today, rdt_move_group_tasks() would move too many tasks.
> mongrp_move() needs both the CLOSID and RMID to match, while
> rdt_move_group_tasks() only needs 0-1 of the two to match.
>
> I tried adding more parameters to rdt_move_group_tasks() to change the
> move condition, but I couldn't make the resulting code not look gross
> and after factoring the tricky stuff into rdt_move_one_task(),
> rdt_move_group_tasks() didn't look interesting enough to reuse.

Could it be made readable by adding a compare function as parameter to
rdt_move_group_tasks() that is used to determine if a task should be moved?

>>> +
>>> + update_closid_rmid(cpus, NULL);
>>> +}
>>
>> I see the tasks in a monitor group handled. There is another way to create/manage
>> a monitor group. For example, by not writing tasks to the tasks file but instead
>> to write CPU ids to the CPU file. All tasks on a particular CPU will be monitored
>> by that group. One rule is that a MON group may only have CPUs that are owned by
>> the CTRL_MON group.
>> It is not clear to me how such a group is handled in this work.
>
> Right, I overlooked this form of monitoring.
>
> I don't think it makes sense to move such a monitoring group, because a
> CPU can only be assigned to one CTRL_MON group. Therefore a move of a MON
> group with an assigned CPU will always violate the parent CTRL_MON group
> rule after the move.
>
> Based on this, I think rename of a MON group should fail when
> rdtgrp->cpu_mask is non-zero.

I think this is fair. Also please write something useful in the last
command status buffer.

>>> +
>>> +static int rdtgroup_rename(struct kernfs_node *kn,
>>> + struct kernfs_node *new_parent, const char *new_name)
>>> +{
>>> + struct rdtgroup *new_prdtgrp;
>>> + struct rdtgroup *rdtgrp;
>>> + cpumask_var_t tmpmask;
>>> + int ret;
>>> +
>>> + if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
>>> + return -ENOMEM;
>>> +
>>> + rdtgrp = kernfs_to_rdtgroup(kn);
>>> + new_prdtgrp = kernfs_to_rdtgroup(new_parent);
>>> + if (!rdtgrp || !new_prdtgrp) {
>>> + free_cpumask_var(tmpmask);
>>> + return -EPERM;
>>> + }
>>> +
>>
>> How robust is this against user space attempting to move files?
>
> I'm not sure I understand the question. Can you be more specific?

This commit adds support for rename to resctrl. I thus expect this
function to be called when user space attempts to move _any_ of
the files and/or directories within resctrl. This could be out of
curiosity, buggy, or maliciousness. I would like to understand how
robust this code would be against such attempts because I do not see
much checking before the preparation to do the move is started.

>>> + /* Release both kernfs active_refs before obtaining rdtgroup mutex. */
>>> + rdtgroup_kn_get(rdtgrp, kn);
>>> + rdtgroup_kn_get(new_prdtgrp, new_parent);
>>> +
>>> + mutex_lock(&rdtgroup_mutex);
>>> +
>>> + if ((rdtgrp->flags & RDT_DELETED) || (new_prdtgrp->flags & RDT_DELETED)) {
>>> + ret = -ESRCH;
>>> + goto out;
>>> + }
>>> +
>>> + /* Only a mon group can be moved to a new mon_groups directory. */
>>> + if (rdtgrp->type != RDTMON_GROUP ||
>>> + !is_mon_groups(new_parent, kn->name)) {
>>> + ret = -EPERM;
>>> + goto out;
>>> + }
>>> +
>>
>> Should in-place moves be allowed?
>
> I don't think it's useful in the context of the intended use case.
> Also, it looks like kernfs_rename() would fail with EEXIST if I tried.
>

>From what I can tell kernfs_rename() will return EEXIST if there
is an attempt to create file/directory with the same name at the same place.
What I am asking about is if user space requests to change the name
of a monitoring group without moving it to a new resource group. This seems
to be supported by this code but if it is supported it could likely be
done more efficiently since no tasks need to be moved because neither
closid nor rmid needs to change.

> If it were important to someone, I could make it succeed before calling
> into kernfs_rename().
>
>
>>
>>> + ret = kernfs_rename(kn, new_parent, new_name);
>>> + if (ret)
>>> + goto out;
>>> +
>>> + mongrp_move(rdtgrp, new_prdtgrp, tmpmask);
>>> +
>>
>> Can tmpmask allocation/free be done in mongrp_move()?
>
> Yes, but it looked like most other functions in this file allocate the
> cpumask up front before validating parameters. If you have a preference
> for internalizing its allocation within mongrp_move(), I can try it.

Could you please elaborate what the concern is? From what I can tell
mongrp_move() is the only user of the cpumask so it is not clear to
me why it cannot take care of the allocation and free.

When referring to existing code I assume you mean rdtgroup_rmdir(). This
is the only code I could find in this file with this pattern. I looked
back at the history and indeed the cpumask was allocated where it was
used but the flow was changed to the current when support for monitoring
groups were added. See f9049547f7e7 ("x86/intel_rdt: Separate the ctrl bits from rmdir")
I do not see a requirement for doing the allocations in that way.

Reinette