Re: High interrupt latency with low power idle mode on i.MX6

From: Schrempf Frieder
Date: Wed May 27 2020 - 08:50:10 EST


On 27.05.20 13:53, Russell King - ARM Linux admin wrote:
> On Wed, May 27, 2020 at 10:39:12AM +0000, Schrempf Frieder wrote:
>> Hi,
>>
>> on our i.MX6UL/ULL boards running mainline kernels, we see an issue with
>> RS485 collisions on the bus. These are caused by the resetting of the
>> RTS signal being delayed after each transmission. The TXDC interrupt
>> takes several milliseconds to trigger and the slave on the bus already
>> starts to send a reply in the meantime.
>>
>> We found out that these delays only happen when the CPU is in "low power
>> idle" mode (ARM power off). When we disable cpuidle state 2 or put some
>> background load on the CPU everything works fine and the delays are gone.
>>
>> echo 1 > /sys/devices/system/cpu/cpu0/cpuidle/state2/disable
>>
>> It seems like also other interfaces (I2C, etc.) might be affected by
>> these increased latencies, we haven't investigated this more closely,
>> though.
>>
>> We currently apply a patch to our kernel, that disables low power idle
>> mode by default, but I'm wondering if there's a way to fix this
>> properly? Any ideas?
>
> Let's examine a basic fact about power management:
>
> The deeper PM modes that the system enters, the higher the latency to
> resume operation.
>
> So, I'm not surprised that you have higher latency when you allow the
> system to enter lower power modes. Does that mean that the kernel
> should not permit entering lower power modes - no, it's policy and
> application dependent.
>
> If the hardware is designed to use software to manage the RTS signal
> to control the RS485 receiver, then I'm afraid that your report really
> does not surprise me - throwing that at software to manage is a really
> stupid idea, but it seems lots of people do this. I've held this view
> since I worked on a safety critical system that used RS485 back in the
> 1990s (London Underground Jubilee Line Extension public address system.)
>
> So, what we have here is several things that come together to create a
> problem:
>
> 1) higher power savings produce higher latency to resume from
> 2) lack of hardware support for RS485 half duplex communication needing
> software support
> 3) an application that makes use of RS485 half duplex communication
> without disabling the higher latency power saving modes
>
> The question is, who should disable those higher latency power saving
> modes - the kernel, or userspace?
>
> The kernel knows whether it needs to provide software control of the
> RTS signal or not, but the kernel does not know the maximum permissible
> latency (which is application specific.) So, the kernel doesn't have
> all the information it needs. However, there is a QoS subsystem which
> may help you.
>
> There's also tweaks available via
> /sys/devices/system/cpu/cpu*/power/pm_qos_resume_latency_us
>
> which can be poked to configure the latency that is required, and will
> prevent the deeper PM states being entered.

Thanks for the detailed explanation. This all makes perfect sense to me.
I will keep in mind that we need to consider this aspect of power saving
vs. latency when designing systems and also that we need to provide the
information for the kernel to decide which of the two is more important.

Also thanks for pointing out the QoS subsystem. I'm not quite sure if it
would work for us to use pm_qos_resume_latency_us in our specific case.
The actual latency we observe is something like 2 to 3 milliseconds
longer with low power idle than without, but the exit_latency for low
power idle specified in the cpuidle driver is only 300 us.

So as far as I can see with this difference even if we would set
pm_qos_resume_latency_us to 1000 us (which should be fast enough for the
RS485 to work properly), the low power idle wouldn't be disabled.

It's rather this discrepancy between the latency set in the driver and
what we see in reality which makes me wonder if there's something I'm
missing.