On Fri, Aug 04, 2023 at 12:51:47PM -0400, Waiman Long wrote:
On 8/4/23 12:28, Will Deacon wrote:Yes. To be honest, I think we've both spent far too much time trying to
From my point of view, the proper way to solve the problem is to reverse theI don't really follow, but waiting a few ms and trying again sounds likeRight, I should add code to handle this error condition. I think it can bestruct dmc620_pmu {It looks like this can bubble up to the probe() routine. Does the driver
@@ -423,9 +424,14 @@ static struct dmc620_pmu_irq *__dmc620_pmu_get_irq(int irq_num)
struct dmc620_pmu_irq *irq;
int ret;
- list_for_each_entry(irq, &dmc620_pmu_irqs, irqs_node)
- if (irq->irq_num == irq_num && refcount_inc_not_zero(&irq->refcount))
+ list_for_each_entry(irq, &dmc620_pmu_irqs, irqs_node) {
+ if (irq->irq_num != irq_num)
+ continue;
+ if (!irq->valid)
+ return ERR_PTR(-EAGAIN); /* Try again later */
core handle -EAGAIN coming back from a probe routine?
handled in dmc620_pmu_get_irq(). The important thing is to release the
mutex, wait a few ms and try again. What do you think?
a really nasty hack for something which doesn't appear to be constrained
by broken hardware. In other words, we got ourselves into this mess, so
we should be able to resolve it properly.
locking order. Since you don't to add a EXPORT statement to the core kernel
code, we will have to find a way around it by not holding the
dmc620_pmu_irqs_lock when cpuhp_state_add_instance_nocalls() is called.
Another alternative that I can think of is to add one more mutex that we
will hold just for the entirety of __dmc620_pmu_get_irq() and take
dmc620_pmu_irqs_lock only when the linked list is being modified. That will
eliminate the need to introduce arbitrary wait as other caller of
__dmc620_pmu_get_irq() will wait in the new mutex. Will this work for you?
fix this (and I admire your persistence!), so adding a mutex to make it
"obviously" correct sounds like the right thing to me. We can look at
optimisations later if anybody cares.