On 09-07-21, 11:37, Thara Gopinath wrote:
On 7/9/21 2:46 AM, Viresh Kumar wrote:
@@ -389,6 +503,10 @@ static int qcom_cpufreq_hw_cpu_exit(struct cpufreq_policy *policy)
dev_pm_opp_remove_all_dynamic(cpu_dev);
dev_pm_opp_of_cpumask_remove_table(policy->related_cpus);
+ if (data->lmh_dcvs_irq > 0) {
+ devm_free_irq(cpu_dev, data->lmh_dcvs_irq, data);
Why using devm variants here and while requesting the irq ?
Missed this one ?
+ cancel_delayed_work_sync(&data->lmh_dcvs_poll_work);
+ }
Please move this to qcom_cpufreq_hw_lmh_exit() or something.
Ok.
Now with sequence of disabling interrupt, etc, I see a potential
problem.
CPU0 CPU1
qcom_cpufreq_hw_cpu_exit()
-> devm_free_irq();
qcom_lmh_dcvs_poll()
-> qcom_lmh_dcvs_notify()
-> enable_irq()
-> cancel_delayed_work_sync();
What will happen if enable_irq() gets called after freeing the irq ?
Not sure, but it looks like you will hit this then from manage.c:
WARN(!desc->irq_data.chip, KERN_ERR "enable_irq before
setup/request_irq: irq %u\n", irq))
?
You got a chicken n egg problem :)
Yes indeed! But also it is a very rare chicken and egg problem.
The scenario here is that the cpus are busy and running load causing a
thermal overrun and lmh is engaged. At the same time for this issue to be
hit the cpu is trying to exit/disable cpufreq.
Yes, it is a very specific case but it needs to be resolved anyway. You don't
want to get this ever :)
Calling
cancel_delayed_work_sync first could solve this issue, right ?
cancel_delayed_work_sync guarantees the work not to be pending even if
it requeues itself on return. So once the delayed work is cancelled, the
interrupts can be safely disabled. Thoughts ?
I don't think even that would provide such guarantees to you here, as there is
a chance the work gets queued again because of an interrupt that triggers right
after you cancel the work.
The basic way of solving such issues is that once you cancel something, you need
to guarantee that it doesn't get triggered again, no matter what.
The problem here I see is with your design itself, both delayed work and irq can
enable each other, so no matter which one you disable first, won't be
sufficient. You need to fix that design somehow.