Re: [PATCH v3] cpufreq: fix race on cpufreq online

From: Rafael J. Wysocki
Date: Thu May 12 2022 - 06:49:35 EST


On Thu, May 12, 2022 at 8:56 AM Viresh Kumar <viresh.kumar@xxxxxxxxxx> wrote:
>
> On 11-05-22, 15:19, Rafael J. Wysocki wrote:
> > On Wed, May 11, 2022 at 2:59 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
> > > > Hmm, I don't think depending on the values of policy->cpus is a good idea to be
> > > > honest. This design is inviting bugs to come in at another place. We need a
> > > > clear flag for this, a new flag or something like policy_list.
> >
> > Why?
>
> Because it doesn't mean anything unless we have code elsewhere which checks this
> specifically. It should be fine though after using policy_is_inactive() in
> show/store as you suggested, which I too tried to do in a patch :)
>
> > > > Also I see the same bug happening while the policy is removed. The kobject is
> > > > put after the rwsem is dropped.
> >
> > This shouldn't be a problem because of the wait_for_completion() in
> > cpufreq_policy_put_kobj(). It is known that cpufreq_sysfs_release()
> > has run when cpufreq_policy_put_kobj() returns, so it is safe to free
> > the policy then.
>
> I agree to that, but the destruction of stuff happens right in
> cpufreq_policy_free() where it starts removing the policy from the list and
> clears cpufreq_cpu_data. I don't know if it will break anything or not, but we
> should disallow any further sysfs operations once we have reached
> cpufreq_policy_free().

Well, would there be a problem with moving the
cpufreq_policy_put_kobj() call to the front of cpufreq_policy_free()?
If we did that, we'd know that everything could be torn down safely,
because nobody would be holding references to the policy any more.

> > TBH, I'm not sure why show() doesn't check policy_is_inactive() under the rwsem.
>
> I agree, both show/store should have it.
>
> > Moreover, I'm not sure why the locking dance in store() is necessary.
>
> commit fdd320da84c6 ("cpufreq: Lock CPU online/offline in cpufreq_register_driver()")

I get that, but I'm wondering if locking CPU hotplug from store() is
needed at all. I mean, if we are in store(), we are holding an active
reference to the policy kobject, so the policy cannot go away until we
are done anyway. Thus it should be sufficient to use the policy rwsem
for synchronization.