Re: [PATCH 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management

From: Daniel Lezcano
Date: Tue Nov 03 2020 - 13:43:02 EST



Hi Lukasz,

thanks for the review and the comments.

On 23/10/2020 12:29, Lukasz Luba wrote:
> Hi Daniel,

[ ... ]

>> +
>> +config DTPM
>> +    bool "Power capping for dynamic thermal power management"
>
> Maybe starting with capital letters: Dynamic Thermal Power Management?

Ok, noted.

[ ... ]

>> +++ b/drivers/powercap/dtpm.c
>> @@ -0,0 +1,430 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Copyright 2020 Linaro Limited
>> + *
>> + * Author: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>
>> + *
>> + * The powercap based Dynamic Thermal Power Management framework
>> + * provides to the userspace a consistent API to set the power limit
>> + * on some devices.
>> + *
>> + * DTPM defines the functions to create a tree of constraints. Each
>> + * parent node is a virtual description of the aggregation of the
>> + * children. It propagates the constraints set at its level to its
>> + * children and collect the children power infomation. The leaves of
>
> s/infomation/information/

Ok, thanks

[ ... ]

>> +static struct powercap_control_type *pct;
>> +static struct dtpm *root;
>
> I wonder if it safe to have the tree without a global lock for it, like
> mutex tree_lock ?
> I have put some comments below when the code traverses the tree.

The mutex is a heavy lock and the its purpose is to allow the current
process to be preempted while the spinlock is very fast without preemption.

Putting in place a single lock will simplify the code but I'm not sure
it is worth as it could be a contention. It would be simpler to switch
to a big lock than the opposite.

[ ... ]

>> +static void dtpm_rebalance_weight(void)
>> +{
>> +    __dtpm_rebalance_weight(root);
>> +}
>> +
>> +static void dtpm_sub_power(struct dtpm *dtpm)
>> +{
>> +    struct dtpm *parent = dtpm->parent;
>> +
>> +    while (parent) {
>
> I am not sure if it is safe for a corner case when the
> nodes are removing from bottom to top. We don't hold a tree
> lock, so these two (above line and below) operations might
> be split/preempted and 'parent' freed before taking the lock.
> Is it possible? (Note: I might missed something like double
> locking using this local node spinlock).

The parent can not be freed until it has children, the check is done in
the release node function.

>> +        spin_lock(&parent->lock);
>> +        parent->power_min -= dtpm->power_min;
>> +        parent->power_max -= dtpm->power_max;
>> +        spin_unlock(&parent->lock);
>> +        parent = parent->parent;
>> +    }
>> +
>> +    dtpm_rebalance_weight();
>> +}
>> +
>> +static void dtpm_add_power(struct dtpm *dtpm)
>> +{
>> +    struct dtpm *parent = dtpm->parent;
>> +
>> +    while (parent) {
>
> Similar here?
>
>> +        spin_lock(&parent->lock);
>> +        parent->power_min += dtpm->power_min;
>> +        parent->power_max += dtpm->power_max;
>> +        spin_unlock(&parent->lock);
>> +        parent = parent->parent;
>> +    }
>> +
>> +    dtpm_rebalance_weight();
>> +}
>> +
>> +/**
>> + * dtpm_update_power - Update the power on the dtpm
>> + * @dtpm: a pointer to a dtpm structure to update
>> + * @power_min: a u64 representing the new power_min value
>> + * @power_max: a u64 representing the new power_max value
>> + *
>> + * Function to update the power values of the dtpm node specified in
>> + * parameter. These new values will be propagated to the tree.
>> + *
>> + * Return: zero on success, -EINVAL if the values are inconsistent
>> + */
>> +int dtpm_update_power(struct dtpm *dtpm, u64 power_min, u64 power_max)
>> +{
>> +    if (power_min == dtpm->power_min && power_max == dtpm->power_max)
>> +        return 0;
>> +
>> +    if (power_max < power_min)
>> +        return -EINVAL;
>> +
>> +    dtpm_sub_power(dtpm);
>> +    spin_lock(&dtpm->lock);
>> +    dtpm->power_min = power_min;
>> +    dtpm->power_max = power_max;
>> +    spin_unlock(&dtpm->lock);
>> +    dtpm_add_power(dtpm);
>> +
>> +    return 0;
>> +}
>> +
>> +/**
>> + * dtpm_release_zone - Cleanup when the node is released
>> + * @pcz: a pointer to a powercap_zone structure
>> + *
>> + * Do some housecleaning and update the weight on the tree. The
>> + * release will be denied if the node has children. This function must
>> + * be called by the specific release callback of the different
>> + * backends.
>> + *
>> + * Return: 0 on success, -EBUSY if there are children
>> + */
>> +int dtpm_release_zone(struct powercap_zone *pcz)
>> +{
>> +    struct dtpm *dtpm = to_dtpm(pcz);
>> +    struct dtpm *parent = dtpm->parent;
>> +
>
> I would lock the whole tree, just to play safe.
> What do you think?

I would like to keep the fine grain locking to prevent a potential
contention. If it appears we hit a locking incorrectness or a race
putting in question the fine grain locking scheme, then we can consider
switching to a tree lock.

>> +    if (!list_empty(&dtpm->children))
>> +        return -EBUSY;
>> +
>> +    if (parent) {
>> +        spin_lock(&parent->lock);
>> +        list_del(&dtpm->sibling);
>> +        spin_unlock(&parent->lock);
>> +    }
>> +
>> +    dtpm_sub_power(dtpm);
>> +
>> +    kfree(dtpm);
>> +
>> +    return 0;
>> +}

[ ... ]

>> +struct dtpm *dtpm_alloc(void)
>> +{
>> +    struct dtpm *dtpm;
>> +
>> +    dtpm = kzalloc(sizeof(*dtpm), GFP_KERNEL);
>> +    if (dtpm) {
>> +        INIT_LIST_HEAD(&dtpm->children);
>> +        INIT_LIST_HEAD(&dtpm->sibling);
>> +        spin_lock_init(&dtpm->lock);
>
> Why do we use spinlock and not mutex?

The mutex will force the calling process to be preempted, that is useful
when the critical sections contains blocking calls.

Here we are just changing values without blocking calls, so using the
spinlock is more adequate as they are faster.

[ ... ]

>> +static int __init dtpm_init(void)
>> +{
>> +    struct dtpm_descr **dtpm_descr;
>> +
>> +    pct = powercap_register_control_type(NULL, "dtpm", NULL);
>> +    if (!pct) {
>> +        pr_err("Failed to register control type\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    for_each_dtpm_table(dtpm_descr)
>> +        (*dtpm_descr)->init(*dtpm_descr);
>
> We don't check the returned value here. It is required that the
> devices should already be up and running (like cpufreq).
> But if for some reason the init() failed, maybe it's worth to add a
> field inside the dtpm_desc or dtpm struct like 'bool ready' ?
> It could be retried to init later.

It would be make sense to check the init return value if we want to
rollback what we have done. Here we don't want to do that. If one
subsystem fails to insert itself in the tree, it will log an error but
the tree should continue to give access to what have been successfully
initialized.

The rollback is important in the init() ops, not in dtpm_init().

>> +
>> +    return 0;
>> +}
>> +late_initcall(dtpm_init);
>
> The framework would start operating at late boot. We don't control
> the thermal/power issues in earier stages.
> Although, at this late stage all other things like cpufreq should be
> ready, so the ->init() on them is likely to success.

Right, the dtpm is accessible through sysfs for an userspace thermal
daemon doing the smart mitigation. So do the initcall can be really late.

[ ... ]

Thanks for the review.

-- Daniel


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog