Re: [PATCH v2 2/2] thermal: power_allocator: update once cooling devices when temp is low

From: Lukasz Luba
Date: Tue Apr 20 2021 - 16:01:57 EST

Next message: Rob Herring: "Re: [PATCH 1/1] dt-bindings: serial: Add label property for pl011"
Previous message: Jason Gunthorpe: "Re: [PATCH] RDMA/mlx4: remove an unused variable"
In reply to: Daniel Lezcano: "Re: [PATCH v2 2/2] thermal: power_allocator: update once cooling devices when temp is low"
Next in thread: Daniel Lezcano: "Re: [PATCH v2 2/2] thermal: power_allocator: update once cooling devices when temp is low"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 4/20/21 4:24 PM, Daniel Lezcano wrote:

On 20/04/2021 16:21, Lukasz Luba wrote:

Hi Daniel,

On 4/20/21 2:30 PM, Daniel Lezcano wrote:

On 19/04/2021 10:45, Lukasz Luba wrote:

[snip]

-        instance->cdev->updated = false;
+        if (update)
+            instance->cdev->updated = false;
+
          mutex_unlock(&instance->cdev->lock);
-        (instance->cdev);
+
+        if (update)
+            thermal_cdev_update(instance->cdev);

This cdev update has something bad IMHO. It is protected by a mutex but
the 'updated' field is left unprotected before calling
thermal_cdev_update().

It is not the fault of this code but how the cooling device are updated
and how it interacts with the thermal instances.

IMO, part of the core code needs to revisited.

I agree, but please check my comments below.

This change tight a bit more the knot.

Would it make sense to you if we create a function eg.
__thermal_cdev_update()

I'm not sure if I assume it right that the function would only have the:
list_for_each_entry(instance, &cdev->thermal_instances, cdev_node)

loop from the thermal_cdev_update(). But if it has only this loop then
it's too little.

And then we have:

void thermal_cdev_update(struct thermal_cooling_device *cdev)
{
         mutex_lock(&cdev->lock);
         /* cooling device is updated*/
         if (cdev->updated) {
                 mutex_unlock(&cdev->lock);
                 return;
         }

    __thermal_cdev_update(cdev);

         thermal_cdev_set_cur_state(cdev, target);

Here we are actually setting the 'target' state via:
cdev->ops->set_cur_state(cdev, target)

then we notify, then updating stats.

         cdev->updated = true;
         mutex_unlock(&cdev->lock);
         trace_cdev_update(cdev, target);

Also this trace is something that I'm using in my tests...

Yeah, I noticed right after sending the comments. All that should be
moved in the lockless function.

Agree

So this function becomes:

void thermal_cdev_update(struct thermal_cooling_device *cdev)
{
mutex_lock(&cdev->lock);
if (!cdev->updated) {
__thermal_cdev_update(cdev);
cdev->updated = true;
}
mutex_unlock(&cdev->lock);

dev_dbg(&cdev->device, "set to state %lu\n", target);
}

We end up with the trace_cdev_update(cdev, target) inside the mutex
section but that should be fine.

True, this shouldn't be an issue.

         dev_dbg(&cdev->device, "set to state %lu\n", target);
}

And in this file we do instead:

-        instance->cdev->updated = false;
+        if (update)
+            __thermal_cdev_update(instance->cdev);
           mutex_unlock(&instance->cdev->lock);
-        thermal_cdev_update(instance->cdev);

Without the line above, we are not un-throttling the devices.

Is it still true with the amended function thermal_cdev_update() ?

That new approach should work. I can test your patch with this new
functions and re-base my work on top of it.
Or you like me to write such patch and send it?

Next message: Rob Herring: "Re: [PATCH 1/1] dt-bindings: serial: Add label property for pl011"
Previous message: Jason Gunthorpe: "Re: [PATCH] RDMA/mlx4: remove an unused variable"
In reply to: Daniel Lezcano: "Re: [PATCH v2 2/2] thermal: power_allocator: update once cooling devices when temp is low"
Next in thread: Daniel Lezcano: "Re: [PATCH v2 2/2] thermal: power_allocator: update once cooling devices when temp is low"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]