Re: [PATCH] cpufreq: fix race on cpufreq online

From: Schspa Shi
Date: Mon May 09 2022 - 11:06:21 EST


Viresh Kumar <viresh.kumar@xxxxxxxxxx> writes:

> I had to dig the old patch first before starting to review what your
> next one does.
>
> On 21-04-22, 03:15, Schspa Shi wrote:
>> When cpufreq online failed, policy->cpus are not empty while
>> cpufreq sysfs file available, we may access some data freed.
>>
>> Take policy->clk as an example:
>>
>> static int cpufreq_online(unsigned int cpu)
>> {
>> ...
>> // policy->cpus != 0 at this time
>> down_write(&policy->rwsem);
>> ret = cpufreq_add_dev_interface(policy);
>
> Please keep some code to help understand where we went from here. I am
> sure you meant that we will error out in this case, but you removed
> the relevant code.
>

Yes, I will add this to the next version of patch.

>> up_write(&policy->rwsem);
>>
>> return 0;
>>
>> out_destroy_policy:
>> for_each_cpu(j, policy->real_cpus)
>> remove_cpu_dev_symlink(policy, get_cpu_device(j));
>> up_write(&policy->rwsem);
>> ...
>> out_exit_policy:
>> if (cpufreq_driver->exit)
>> cpufreq_driver->exit(policy);
>> clk_put(policy->clk);
>> // policy->clk is a wild pointer
>> ...
>> ^
>> |
>> Another process access
>> __cpufreq_get
>> cpufreq_verify_current_freq
>> cpufreq_generic_get
>> // acces wild pointer of policy->clk;
>> |
>> |
>> out_offline_policy: |
>> cpufreq_policy_free(policy); |
>> // deleted here, and will wait for no body reference
>> cpufreq_policy_put_kobj(policy);
>> }
>>
>> Signed-off-by: Schspa Shi <schspa@xxxxxxxxx>
>> ---
>> drivers/cpufreq/cpufreq.c | 5 +++--
>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>> index 80f535cc8a75..0d58b0f8f3af 100644
>> --- a/drivers/cpufreq/cpufreq.c
>> +++ b/drivers/cpufreq/cpufreq.c
>> @@ -1533,8 +1533,6 @@ static int cpufreq_online(unsigned int cpu)
>> for_each_cpu(j, policy->real_cpus)
>> remove_cpu_dev_symlink(policy, get_cpu_device(j));
>>
>> - up_write(&policy->rwsem);
>> -
>> out_offline_policy:
>> if (cpufreq_driver->offline)
>> cpufreq_driver->offline(policy);
>> @@ -1543,6 +1541,9 @@ static int cpufreq_online(unsigned int cpu)
>> if (cpufreq_driver->exit)
>> cpufreq_driver->exit(policy);
>>
>> + cpumask_clear(policy->cpus);
>> + up_write(&policy->rwsem);
>
> This is simply buggy as now an error out to out_offline_policy or
> out_exit_policy will try to release a semaphore which was never taken
> in the first place. This works fine only if we failed late, i.e. via
> out_destroy_policy.
>

I am very sorry for this oversight.

To fix this issue, there is no need to move cpufreq_driver->exit(policy)
and cpufreq_driver->offline(policy) to inside of &policy->rwsem.
I made this change because they are inside of &policy->rwsem write lock
at cpufreq_offline. I think we should keep offline & exit call inside of
policy->rwsem for better symmetry.

static int cpufreq_offline(unsigned int cpu)
{
...
down_write(&policy->rwsem);
...
/*
* Perform the ->offline() during light-weight tear-down, as
* that allows fast recovery when the CPU comes back.
*/
if (cpufreq_driver->offline) {
cpufreq_driver->offline(policy);
} else if (cpufreq_driver->exit) {
cpufreq_driver->exit(policy);
policy->freq_table = NULL;
}

unlock:
up_write(&policy->rwsem);
return 0;
}

> The very first thing we need to do now is revert this patch. Lemme
> send a patch for that and you can send a fresh fix over that once you
> have a stable fix.

For the next version of the stable fix, I'd be willing to keep exit and
offline calls inside of policy->rwsem. But it's OK for me to keep offline
& exit calls outside of policy->rwsem.

---
BRs


Schspa Shi