Re: [PATCH] cpufreq: fix race on cpufreq online

From: Viresh Kumar
Date: Mon May 09 2022 - 00:07:35 EST


I had to dig the old patch first before starting to review what your
next one does.

On 21-04-22, 03:15, Schspa Shi wrote:
> When cpufreq online failed, policy->cpus are not empty while
> cpufreq sysfs file available, we may access some data freed.
>
> Take policy->clk as an example:
>
> static int cpufreq_online(unsigned int cpu)
> {
> ...
> // policy->cpus != 0 at this time
> down_write(&policy->rwsem);
> ret = cpufreq_add_dev_interface(policy);

Please keep some code to help understand where we went from here. I am
sure you meant that we will error out in this case, but you removed
the relevant code.

> up_write(&policy->rwsem);
>
> return 0;
>
> out_destroy_policy:
> for_each_cpu(j, policy->real_cpus)
> remove_cpu_dev_symlink(policy, get_cpu_device(j));
> up_write(&policy->rwsem);
> ...
> out_exit_policy:
> if (cpufreq_driver->exit)
> cpufreq_driver->exit(policy);
> clk_put(policy->clk);
> // policy->clk is a wild pointer
> ...
> ^
> |
> Another process access
> __cpufreq_get
> cpufreq_verify_current_freq
> cpufreq_generic_get
> // acces wild pointer of policy->clk;
> |
> |
> out_offline_policy: |
> cpufreq_policy_free(policy); |
> // deleted here, and will wait for no body reference
> cpufreq_policy_put_kobj(policy);
> }
>
> Signed-off-by: Schspa Shi <schspa@xxxxxxxxx>
> ---
> drivers/cpufreq/cpufreq.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 80f535cc8a75..0d58b0f8f3af 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1533,8 +1533,6 @@ static int cpufreq_online(unsigned int cpu)
> for_each_cpu(j, policy->real_cpus)
> remove_cpu_dev_symlink(policy, get_cpu_device(j));
>
> - up_write(&policy->rwsem);
> -
> out_offline_policy:
> if (cpufreq_driver->offline)
> cpufreq_driver->offline(policy);
> @@ -1543,6 +1541,9 @@ static int cpufreq_online(unsigned int cpu)
> if (cpufreq_driver->exit)
> cpufreq_driver->exit(policy);
>
> + cpumask_clear(policy->cpus);
> + up_write(&policy->rwsem);

This is simply buggy as now an error out to out_offline_policy or
out_exit_policy will try to release a semaphore which was never taken
in the first place. This works fine only if we failed late, i.e. via
out_destroy_policy.

The very first thing we need to do now is revert this patch. Lemme
send a patch for that and you can send a fresh fix over that once you
have a stable fix.

--
viresh