Re: [PATCH 1/2] sc16is7xx: Fix for multi-channel stall

From: Andreas FÃrber
Date: Sat Sep 29 2018 - 10:05:01 EST


Hi Phil and Greg,

Am 12.09.18 um 16:31 schrieb Phil Elwell:
> The SC16IS752 is a dual-channel device. The two channels are largely
> independent, but the IRQ signals are wired together as an open-drain,
> active low signal which will be driven low while either of the
> channels requires attention, which can be for significant periods of
> time until operations complete and the interrupt can be acknowledged.
> In that respect it is should be treated as a true level-sensitive IRQ.
>
> The kernel, however, needs to be able to exit interrupt context in
> order to use I2C or SPI to access the device registers (which may
> involve sleeping). Therefore the interrupt needs to be masked out or
> paused in some way.
>
> The usual way to manage sleeping from within an interrupt handler
> is to use a threaded interrupt handler - a regular interrupt routine
> does the minimum amount of work needed to triage the interrupt before
> waking the interrupt service thread. If the threaded IRQ is marked as
> IRQF_ONESHOT the kernel will automatically mask out the interrupt
> until the thread runs to completion. The sc16is7xx driver used to
> use a threaded IRQ, but a patch switched to using a kthread_worker
> in order to set realtime priorities on the handler thread and for
> other optimisations. The end result is non-threaded IRQ that
> schedules some work then returns IRQ_HANDLED, making the kernel
> think that all IRQ processing has completed.
>
> The work-around to prevent a constant stream of interrupts is to
> mark the interrupt as edge-sensitive rather than level-sensitive,
> but interpreting an active-low source as a falling-edge source
> requires care to prevent a total cessation of interrupts. Whereas
> an edge-triggering source will generate a new edge for every interrupt
> condition a level-triggering source will keep the signal at the
> interrupting level until it no longer requires attention; in other
> words, the host won't see another edge until all interrupt conditions
> are cleared. It is therefore vital that the interrupt handler does not
> exit with an outstanding interrupt condition, otherwise the kernel
> will not receive another interrupt unless some other operation causes
> the interrupt state on the device to be cleared.
>
> The existing sc16is7xx driver has a very simple interrupt "thread"
> (kthread_work job) that processes interrupts on each channel in turn
> until there are no more. If both channels are active and the first
> channel starts interrupting while the handler for the second channel
> is running then it will not be detected and an IRQ stall ensues. This
> could be handled easily if there was a shared IRQ status register, or
> a convenient way to determine if the IRQ had been deasserted for any
> length of time, but both appear to be lacking.
>
> Avoid this problem (or at least make it much less likely to happen)
> by reducing the granularity of per-channel interrupt processing
> to one condition per iteration, only exiting the overall loop when
> both channels are no longer interrupting.
>
> Signed-off-by: Phil Elwell <phil@xxxxxxxxxxxxxxx>
> ---
> drivers/tty/serial/sc16is7xx.c | 19 +++++++++++++------
> 1 file changed, 13 insertions(+), 6 deletions(-)

These two patches seem to be applied in linux-next tree, but are lacking
a Fixes: header for backporting to affected stable trees.

openSUSE Tumbleweed's 4.18 appears to be affected, and I didn't see it
in linux.git for upcoming 4.19 either.

Can the commit message still be updated to get this fixed everywhere?

Which is the offending commit, this one from 2016? The only other 2018
change seems an unrelated error handling cleanup.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/tty/serial/sc16is7xx.c?id=04da73803c05dc1150ccc31cbf93e8cd56679c09

Also, the above commit message confuses me as to how after this commit
the driver should be used: Should the interrupt still be falling-edge,
as documented in the DT bindings, or do user/documentation need to
switch to level-triggered now?

Thanks,
Andreas

--
SUSE Linux GmbH, Maxfeldstr. 5, 90409 NÃrnberg, Germany
GF: Felix ImendÃrffer, Jane Smithard, Graham Norton
HRB 21284 (AG NÃrnberg)