Re: [PATCH] arc: make sure __delay() never gets executed with 0 loops

From: Alexey Brodkin
Date: Wed Feb 24 2016 - 09:11:40 EST

Hi Vineet,

On Wed, 2016-02-24 at 05:05 +-0000, Vineet Gupta wrote:
+AD4- On Monday 15 February 2016 10:07 PM, Alexey Brodkin wrote:
+AD4- +AD4- Current implementation of +AF8AXw-delay() function uses so-called
+AD4- +AD4- zero-delay loops. And the only condition to exit that loop is
+AD4- +AD4- LP+AF8-COUNT (loop count register) +AD0- 1 (but not 0 as it might be easily
+AD4- +AD4- imagined).
+AD4- So u can fix this better by doing a, but....
+AD4- +AD4- So if our calculation of +ACI-loops+ACI- gives 0 (and that is pretty possible
+AD4- +AD4- given result of multiplication being +AD4APg- 32) then zero-delay loop
+AD4- +AD4- mechanism starts with LP+AF8-COUNT+AD0-0 and it ends up decrementing LP+AF8-COUNT
+AD4- +AD4- while staying in the loop effectively producing close to infinite delay
+AD4- +AD4- instead of very short one.
+AD4- +AD4-
+AD4- +AD4- I bumped into it with AXS101 +- external DDR controller and caches
+AD4- +AD4- disabled. In that case I've got very small
+AD4- +AD4- loops+AF8-per+AF8-jiffy+AD0-0xf00:
+AD4- I understand this gives you grief, but the code is doing exactly what it is asked to.
+AD4- Since the system is slow, You are getting only 0xf00 (3840) loop iterations in 10ms.
+AD4- So if you want say a delay of 1 micro-sec, you will need to loop for 3840 / 10000
+AD4- +AH4- 0 loops
+AD4- This all assumes our lpj computation is correct - otherwise that needs fixing too.
+AD4- Anyways I think for genuine cases where the number of loops is indeed computed to
+AD4- 0 because caller was passing too small a value, it is better to wait for looong
+AD4- time to catch the bugger rather than silently returning. This is one of the cases
+AD4- where disease is better than the cure +ACE-

Ok, but see delays (even those for just a few usecs) might be justified by
hardware requirements (i.e. some peripheral need at least 10 usecs between this
and tat operations). So one driver will attempt to use completely correct value
for delay but on slow hardware (or especially in simulation) we'll get kernel
virtually stuck for no obvious reason.

So frankly I don't like proposal to keep existing implementation.
And solution with looks much better to me.