Re: [Intel-gfx] [PATCH] drm/i915: use udelay for very short delays

From: Nicholas Mc Guire
Date: Thu Dec 15 2016 - 05:58:30 EST


On Thu, Dec 15, 2016 at 10:27:59AM +0100, Daniel Vetter wrote:
> On Thu, Dec 15, 2016 at 10:25:19AM +0100, Daniel Vetter wrote:
> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> > > On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@xxxxxxxxx> wrote:
> > > > Even on fast systems a 2 microsecond delay is most likely more efficient
> > > > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> > > > change this to a udelay(2).
> > >
> > > Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> > > so this boils down to which is the better use of CPU. We could probably
> > > relax the max delay more if that was helpful. But I'm not immediately
> > > sold on "is most likely more efficient" which sounds like a gut feeling.
> > >
> > > I'm sorry it's not clear in my other reply that I do appreciate
> > > addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> > > convinced udelay() is the answer.
> >
> > So one reason why we unconditionally use *sleep variants is the
> > might_sleep check. Because in the past people have used udelay and mdelay,
> > those delays had to be increased a lot because hw, and at the same time
> > someone added users of these functions to our irq helper, resulting in irq
> > handling times measures in multiple ms. That's not good.
> >
> > So until someone can demonstrate that there's a real benefit (which let's
> > be honest, for modeset code, will never be the case) I very highly prefer
> > to use *sleep* functions. They prevent some silly things from happening by
> > accident. Micro-optimizing modeset code and hampering maitainability in
> > the process is not good.
>
> Also, the entire premise seems backwards: usleep_range is inefficient for
> certain parameter ranges and better replaced with udelay. That makes
> sense.
>
> But why exactly do we not fix udelay_range then, but instead do a cocci
> job crawling through all the thousands of callers? Just fix usleep(_range)
> to use udelay for very small values (and still keep the might_sleep check
> ofc) if that's more efficient!

its actually not thousands more like a few dozen:

usleep_range(min,max) in linux-stable 4.9.0

1648 calls total
1488 pass numeric values only (90.29%)
27 min below 10us (1.81%)
40 min above 10ms (2.68%)
min out of spec 4.50%
76 preprocessor constants (4.61%)
1 min below 10us (1.31%)
8 min above 10ms (10.52%)
min out of spec 11.84%
85 expressions (5.15%)
1(0) min below 10us (1.50%)*
6(2) min above 10ms (7.50%)*
min out of spec 9.0%
Errors:
23 where min==max (1.39%)
0 where max < min (0.00%)

Total:
Bugs: 6.48%-10.70%*
Crit: 3.09%-3.15%* (min < 10 and min==max and max < min)
Detectable by coccinelle:
Bugs: 74/103 (71.8%)
Crit: 50/52 (96.1%)

*based on estimats as runtime values not known.


there is no usleep() as noted in Documentation/timers/timers-howto.txt
- Why not usleep?
On slower systems, (embedded, OR perhaps a speed-
stepped PC!) the overhead of setting up the hrtimers
for usleep *may* not be worth it. Such an evaluation
will obviously depend on your specific situation, but
it is something to be aware of.

and it suggests to use different API for different ranges - sounds sane
to me and seems to cover the needs of drivers.

thx!
hofrat