Re: [2/2] drm/msm: Add support for GPU cooling

From: Akhil P Oommen
Date: Tue Oct 13 2020 - 09:53:43 EST


On 10/12/2020 11:10 PM, mka@xxxxxxxxxxxx wrote:
On Mon, Oct 12, 2020 at 07:03:51PM +0530, Akhil P Oommen wrote:
On 10/10/2020 12:06 AM, mka@xxxxxxxxxxxx wrote:
Hi Akhil,

On Thu, Oct 08, 2020 at 10:39:07PM +0530, Akhil P Oommen wrote:
Register GPU as a devfreq cooling device so that it can be passively
cooled by the thermal framework.

Signed-off-by: Akhil P Oommen <akhilpo@xxxxxxxxxxxxxx>
---
drivers/gpu/drm/msm/msm_gpu.c | 13 ++++++++++++-
drivers/gpu/drm/msm/msm_gpu.h | 2 ++
2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 55d1648..93ffd66 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -14,6 +14,7 @@
#include <generated/utsrelease.h>
#include <linux/string_helpers.h>
#include <linux/devfreq.h>
+#include <linux/devfreq_cooling.h>
#include <linux/devcoredump.h>
#include <linux/sched/task.h>
@@ -107,9 +108,18 @@ static void msm_devfreq_init(struct msm_gpu *gpu)
if (IS_ERR(gpu->devfreq.devfreq)) {
DRM_DEV_ERROR(&gpu->pdev->dev, "Couldn't initialize GPU devfreq\n");
gpu->devfreq.devfreq = NULL;
+ return;
}
devfreq_suspend_device(gpu->devfreq.devfreq);
+
+ gpu->cooling = of_devfreq_cooling_register(gpu->pdev->dev.of_node,
+ gpu->devfreq.devfreq);
+ if (IS_ERR(gpu->cooling)) {
+ DRM_DEV_ERROR(&gpu->pdev->dev,
+ "Couldn't register GPU cooling device\n");
+ gpu->cooling = NULL;
+ }
}
static int enable_pwrrail(struct msm_gpu *gpu)
@@ -926,7 +936,6 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
msm_devfreq_init(gpu);
-
Will remove this unintended change.
gpu->aspace = gpu->funcs->create_address_space(gpu, pdev);
if (gpu->aspace == NULL)
@@ -1005,4 +1014,6 @@ void msm_gpu_cleanup(struct msm_gpu *gpu)
gpu->aspace->mmu->funcs->detach(gpu->aspace->mmu);
msm_gem_address_space_put(gpu->aspace);
}
+
+ devfreq_cooling_unregister(gpu->cooling);

Resources should be released in reverse order, otherwise the cooling device
could use resources that have already been freed.
Why do you think this is not the correct order? If you are thinking
about devfreq struct, it is managed device resource.

I did not check specifically if changing the frequency really uses any of the
resources that are released previously, In any case it's not a good idea to
allow other parts of the kernel to use a half initialized/torn down device.
Even if it isn't a problem today someone could change the driver to use any
of these resources (or add a new one) in a frequency change, without even
thinking about the cooling device, just (rightfully) asuming that things are
set up and torn down in a sane order.
'sane order' relative to what specifically here? Should we worry about freq change at this point because we have already disabled gpu runtime pm and devfreq?

-Akhil.
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel



-Akhil.