Re: [PATCH v1] hwmon: (lm90) Use edge-triggered interrupt

From: Dmitry Osipenko
Date: Thu Jun 17 2021 - 09:48:14 EST


17.06.2021 16:12, Guenter Roeck пишет:
> On Thu, Jun 17, 2021 at 10:11:19AM +0300, Dmitry Osipenko wrote:
>> 17.06.2021 03:12, Guenter Roeck пишет:
>>> On Wed, Jun 16, 2021 at 10:07:08PM +0300, Dmitry Osipenko wrote:
>>>> The LM90 driver uses level-based interrupt triggering. The interrupt
>>>> handler prints a warning message about the breached temperature and
>>>> quits. There is no way to stop interrupt from re-triggering since it's
>>>> level-based, thus thousands of warning messages are printed per second
>>>> once interrupt is triggered. Use edge-triggered interrupt in order to
>>>> fix this trouble.
>>>>
>>>> Fixes: 109b1283fb532 ("hwmon: (lm90) Add support to handle IRQ")
>>>> Signed-off-by: Dmitry Osipenko <digetx@xxxxxxxxx>
>>>> ---
>>>> drivers/hwmon/lm90.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/hwmon/lm90.c b/drivers/hwmon/lm90.c
>>>> index ebbfd5f352c0..ce8ebe60fcdc 100644
>>>> --- a/drivers/hwmon/lm90.c
>>>> +++ b/drivers/hwmon/lm90.c
>>>> @@ -1908,7 +1908,7 @@ static int lm90_probe(struct i2c_client *client)
>>>> dev_dbg(dev, "IRQ: %d\n", client->irq);
>>>> err = devm_request_threaded_irq(dev, client->irq,
>>>> NULL, lm90_irq_thread,
>>>> - IRQF_TRIGGER_LOW | IRQF_ONESHOT,
>>>> + IRQF_TRIGGER_FALLING | IRQF_ONESHOT,
>>>> "lm90", client);
>>>
>>> We can't do that. Problem is that many of the devices supported by this driver
>>> behave differently when it comes to interrupts. Specifically, the interrupt
>>> handler is supposed to reset the interrupt condition (ie reading the status
>>> register should reset it). If that is the not the case for a specific chip,
>>> we'll have to update the code to address the problem for that specific chip.
>>> The above code would probably just generate a single interrupt while never
>>> resetting the interrupt condition, which is obviously not what we want to
>>> happen.
>>
>> The nct1008/72 datasheet [1] says that reading the status register
>> doesn't reset interrupt until temperature is returned back into normal
>> state, which is what I'm witnessing.
>>
>> [1] https://www.onsemi.com/pdf/datasheet/nct1008-d.pdf
>>
>> Page 10 "Status Register":
>>
>> "Reading the status register clears the five flags, Bit 6 to Bit 2,
>> provided the error conditions causing the flags to beset have gone
>> away. A flag bit can be reset only if the corresponding
>> value register contains an in-limit measurement or if the
>> sensor is good."
>>
>> So the interrupt handler doesn't actually stop interrupt from
>> reoccurring and the whole KMSG is instantly spammed with:
>>
>> ...
>> [ 217.484034] lm90 0-004c: temp2 out of range, please check!
>> [ 217.484569] lm90 0-004c: temp2 out of range, please check!
>> [ 217.485006] systemd-journald[179]: /dev/kmsg buffer overrun, some
>> messages lost.
>> [ 217.485109] lm90 0-004c: temp2 out of range, please check!
>> [ 217.485699] lm90 0-004c: temp2 out of range, please check!
>> [ 217.486235] lm90 0-004c: temp2 out of range, please check!
>> [ 217.486776] lm90 0-004c: temp2 out of range, please check!
>> [ 217.486874] systemd-journald[179]: /dev/kmsg buffer overrun, ...
>>
>> It's interesting that the very first version of the nct1008-support
>> patch used edge-triggered interrupt flags [2].
>>
>> [2] http://lkml.iu.edu/hypermail/linux/kernel/1104.1/01669.html
>>
> A lot of this depends on the chip and its wiring, as well as on chip
> configuration. Even for a specific chip there may be configuration
> dependencies. The interrupt configuration in situations like this
> should really be determined by devicetree configuration, and not
> be hardcoded. Is this a devicetree based system ? If so, there should
> be an entry for this chip pointing to the interrupt, and that entry
> should include a trigger mask. That mask should be set to edge
> triggered.

This is a device-tree based system, in particular it's NVIDIA Tegra30
Nexus 7. The interrupt support was originally added to the lm90 driver
by Wei Ni who works at NVIDIA and did it for the Tegra boards. The Tegra
device-trees are specifying the trigger mask and apparently they all are
cargo-culted and wrong because they use IRQ_TYPE_LEVEL_HIGH, while it
should be IRQ_TYPE_EDGE_FALLING.

The IRQF flag in devm_request_threaded_irq() overrides the trigger mask
specified in a device-tree. IIUC, the interrupt is used only by OF-based
devices, hence I think we could simply remove the IRQF flag from the
code and fix the device-trees. Does it sound good to you?