Re: [PATCH] irq: Resolve that mask_irq/unmask_irq may not be called in pairs

From: Andy Shevchenko
Date: Thu Dec 14 2023 - 11:11:44 EST


On Thu, Dec 14, 2023 at 01:06:23PM +0300, Serge Semin wrote:
> On Thu, Dec 14, 2023 at 09:54:06AM +0800, xiongxin wrote:
> > 在 2023/12/13 22:59, Thomas Gleixner 写道:
> > > On Wed, Dec 13 2023 at 10:29, xiongxin wrote:
> > > > 在 2023/12/12 23:17, Thomas Gleixner 写道:
> > > > Sorry, the previous reply may not have clarified the BUG process. I
> > > > re-debugged and confirmed it yesterday. The current BUG execution
> > > > sequence is described as follows:
> > >
> > > It's the sequence how this works and it works correctly.
> > >
> > > Just because it does not work on your machine it does not mean that this
> > > is incorrect and a BUG.
> > >
> > > You are trying to fix a symptom and thereby violating guarantees of the
> > > core code.
> > >
> > > > That is, there is a time between the 1:handle_level_irq() and
> > > > 3:irq_thread_fn() calls for the 2:disable_irq() call to acquire the lock
> > > > and then implement the irq_state_set_disabled() operation. When finally
> > > > call irq_thread_fn()->irq_finalize_oneshot(), it cannot enter the
> > > > unmask_thread_irq() process.
> > >
> > > Correct, because the interrupt has been DISABLED in the mean time.
> > >
> > > > In this case, the gpio irq_chip irq_mask()/irq_unmask() callback pairs
> > > > are not called in pairs, so I think this is a BUG, but not necessarily
> > > > fixed from the irq core code layer.
> > >
> > > No. It is _NOT_ a BUG. unmask() is not allowed to be invoked when the
> > > interrupt is DISABLED. That's the last time I'm going to tell you that.
> > > Only enable_irq() can undo the effect of disable_irq(), period.
> > >
> > > > Next, when the gpio controller driver calls the suspend/resume process,
> > > > it is as follows:
> > > >
> > > > suspend process:
> > > > dwapb_gpio_suspend()
> > > > ctx->int_mask = dwapb_read(gpio, GPIO_INTMASK);
> > > >
> > > > resume process:
> > > > dwapb_gpio_resume()
> > > > dwapb_write(gpio, GPIO_INTMASK, ctx->int_mask);
> > >
> > > Did you actually look at the sequence I gave you?
> > >
> > > Suspend:
> > >
> > > i2c_hid_core_suspend()
> > > disable_irq(); <- Marks it disabled and eventually
> > > masks it.
> > >
> > > gpio_irq_suspend()
> > > save_registers(); <- Saves masked interrupt
> > >
> > > Resume:
> > >
> > > gpio_irq_resume()
> > > restore_registers(); <- Restores masked interrupt
> > >
> > > i2c_hid_core_resume()
> > > enable_irq(); <- Unmasks interrupt and removes the
> > > disabled marker
> > >
> > >
> > > Have you verified that this order of invocations is what happens on
> > > your machine?
> > >
> > > Thanks,
> > >
> > > tglx
> >
> > As described earlier, in the current situation, the irq_mask() callback of
> > gpio irq_chip is called in mask_irq(), followed by the disable_irq() in
> > i2c_hid_core_suspend(), unmask_irq() will not be executed.
> >
> > Then call enable_irq() in i2c_hid_core_resume(). Since gpio irq_chip does
> > not implement the irq_startup() callback, it ends up calling irq_enable().
> >
> > The irq_enable() function is then implemented as follows:
> >
> > irq_state_clr_disabled(desc);
> > if (desc->irq_data.chip->irq_enable) {
> > desc->irq_data.chip->irq_enable(&desc->irq_data);
> > irq_state_clr_masked(desc);
> > } else {
> > unmask_irq(desc);
> > }
> >
> > Because gpio irq_chip implements irq_enable(), unmask_irq() is not executed,
> > and gpio irq_chip's irq_unmask() callback is not called. Instead,
> > irq_state_clr_masked() was called to clear the masked flag.
> >
> > The irq masked behavior is actually controlled by the
> > irq_mask()/irq_unmask() callback function pairs in gpio irq_chip. When the
> > whole situation occurs, there is one more irq_mask() operation, or one less
> > irq_unmask() operation. This ends the i2c hid resume and the gpio
> > corresponding i2c hid interrupt is also masked.
> >
> > Please help confirm whether the current situation is a BUG, or suggest other
> > solutions to fix it.
>
> I has just been Cc'ed to this thread from the very start of the thread
> so not sure whether I fully understand the problem. But a while ago an
> IRQ disable/undisable-mask/unmask-unpairing problem was reported for
> DW APB GPIO driver implementation, but in a another context though:
> https://lore.kernel.org/linux-gpio/1606728979-44259-1-git-send-email-luojiaxing@xxxxxxxxxx/
> We didn't come to a final conclusion back then, so the patch got lost
> in the emails archive. Xiong, is it related to the problem in your
> case?

>From what explained it sounds to me that GPIO driver has missing part and
this seems the missing patch (in one or another form). Perhaps we can get
a second round of review,

--
With Best Regards,
Andy Shevchenko