Re: [PATCH v3] PM: EM: Fix potential division-by-zero error in em_compute_costs()

From: Yaxiong Tian
Date: Tue Apr 15 2025 - 23:05:23 EST




在 2025/4/16 01:17, Rafael J. Wysocki 写道:
On Mon, Apr 14, 2025 at 11:09 AM Yaxiong Tian <iambestgod@xxxxxx> wrote:

From: Yaxiong Tian <tianyaxiong@xxxxxxxxxx>

When the device is of a non-CPU type, table[i].performance won't be
initialized in the previous em_init_performance(), resulting in division
by zero when calculating costs in em_compute_costs().

Since the 'cost' algorithm is only used for EAS energy efficiency
calculations and is currently not utilized by other device drivers, we
should add the _is_cpu_device(dev) check to prevent this division-by-zero
issue.

Fixes: <1b600da51073> ("PM: EM: Optimize em_cpu_energy() and remove division")

Please look at the Fixes: tags in the kernel git history. They don't
look like the one above.

Yes, there's an extra '<>' here.

Signed-off-by: Yaxiong Tian <tianyaxiong@xxxxxxxxxx>
---
kernel/power/energy_model.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index d9b7e2b38c7a..fc972cc1fc12 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -235,7 +235,7 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table,

/* Compute the cost of each performance state. */
for (i = nr_states - 1; i >= 0; i--) {
- unsigned long power_res, cost;
+ unsigned long power_res, cost = 0;

if ((flags & EM_PERF_DOMAIN_ARTIFICIAL) && cb->get_cost) {
ret = cb->get_cost(dev, table[i].frequency, &cost);
@@ -244,7 +244,7 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table,
cost, ret);
return -EINVAL;
}
- } else {
+ } else if (_is_cpu_device(dev)) {

Can't you just check this upfront at the beginning of the function and
make it bail out if dev is not a CPU device?

Sure, But the current implementation applies em_compute_costs() to both non-CPU devices and CPU devices. After carefully reviewing the latest code,
I've found this issue has expanded in scope.

There are currently three call paths for invoking em_compute_costs():

1) Registering performance domains (for both non-CPU and CPU devices)
em_dev_register_perf_domain() → em_create_pd() →
em_create_perf_table() → em_compute_costs()

2)EM update paths (CPU devices only)

Periodic 1000ms update check via em_update_work work item:
em_check_capacity_update() → em_adjust_new_capacity() → em_recalc_and_update() → em_compute_costs()

Exynos-chip initialization:
em_dev_update_chip_binning() → em_recalc_and_update() → em_compute_costs()

3) Device cost computation (non-CPU devices only - currently unused)
em_dev_compute_costs() → em_compute_costs()

Note: In em_dev_compute_costs(), when calling em_compute_costs(),
neither the callback (cb) nor flags are set.In fact, it either does
nothing at all or performs incorrect operations.

Therefore, should we mandate that non-CPU devices must provide a
get_cost callback?

So Should we add a check at the beginning of the em_compute_costs() to:

if (!_is_cpu_device(dev) && !cb->get_cost) {
dev_dbg(dev, "EM: No get_cost provided, cost unset.\n");
return 0;
}
And Modify em_dev_compute_costs() to require callers to provide the cb
callback function,Also need to update its corresponding comments.


/* increase resolution of 'cost' precision */
power_res = table[i].power * 10;
cost = power_res / table[i].performance;
--