Re: [PATCH v2 2/3] spi: pxa2xx: Prepare for edge-triggered interrupts

From: Jan Kiszka
Date: Thu Jan 19 2017 - 11:04:50 EST


On 2017-01-18 13:46, Mark Brown wrote:
> On Wed, Jan 18, 2017 at 10:33:07AM +0100, Jan Kiszka wrote:
>> On 2017-01-18 09:21, Robert Jarzmik wrote:
>
>>>>>> + while (1) {
>
>>>>> This bit worries me a bit, as this can be either :
>>>>> - hogging the SoC's CPU, endlessly running
>>>>> - or even worse, blocking the CPU for ever
>
>>>>> The question behind is, should this be done in a top-half, or moved to a irq
>>>>> thread ?
>
>>>> Every device with a broken interrupt source can hog CPUs, nothing
>>>> special with this one. If you don't close the loop in the handler
>>>> itself, you close it over the hardware retriggering the interrupt over
>>>> and over again.
>
>>> I'm not speaking of a broken interrupt source, I'm speaking of a broken code,
>>> such as in the handler, or broken status readback, or lack of understanding on
>>> the status register which may imply the while(1) to loop forever.
>
>>>> So, I don't see a point in offloading to a thread. The normal case is
>>>> some TX done (FIFO available) event followed by an RX event, then the
>>>> transfer is complete, isn't it?
>
>>> The point is if you stay forever in the while(1) loop, you can at least have a
>>> print a backtrace (LOCKUP_DETECTOR).
>
>> I won't consider "debugability" as a good reason to move interrupt
>> handlers into threads. There should be real workload that requires
>> offloading or specific prioritization.
>
> It's failure mitigation - you're translating a hard lockup into
> something that will potentially allow the system to soldier on which is
> likely to be less severe for the user as well as making things easier to
> figure out. If we're doing something like this I'd at least have a
> limit on how long we allow the interrupt to scream.
>

OK, OK, if that is the biggest worry, I can change the pattern from
loop-based to SCCR1-based, i.e. mask all interrupt sources once per
interrupt so that we enforce a falling edge. Fine.

But now I'm looking at the driver, wondering who all is fiddling under
which conditions with SCCR1. There are a lot of RMW patterns, but I do
not see the locking pattern behind that. Are all RMW accesses run only
in the interrupt handler context? Unlikely, at least with the dmaengine
in the loop.

Closing my eyes regarding this potential issue for now, the patch could
become as simple as

diff --git a/drivers/spi/spi-pxa2xx.c b/drivers/spi/spi-pxa2xx.c
index 0d10090..f9c2329 100644
--- a/drivers/spi/spi-pxa2xx.c
+++ b/drivers/spi/spi-pxa2xx.c
@@ -785,6 +785,9 @@ static irqreturn_t ssp_int(int irq, void *dev_id)
if (!(status & mask))
return IRQ_NONE;

+ pxa2xx_spi_write(drv_data, SSCR1, sccr1_reg & ~drv_data->int_cr1);
+ pxa2xx_spi_write(drv_data, SSCR1, sccr1_reg);
+
if (!drv_data->master->cur_msg) {
handle_bad_msg(drv_data);
/* Never fail */

Not efficient /wrt register accesses, but that's apparently not yet
a design goal anyway (I stumbled over the SSCR1 locking while
considering to introduce a cache for that reg).

Jan

--
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

Attachment: signature.asc
Description: OpenPGP digital signature