Re: [PATCH RFC] iopoll: allow for poll_timeout to back-off

From: Nicholas Mc Guire
Date: Thu Feb 23 2017 - 05:06:37 EST


On Thu, Feb 23, 2017 at 05:29:03PM +0900, Masahiro Yamada wrote:
> Hi Nicholas,
>
>
> 2017-02-14 1:52 GMT+09:00 Nicholas Mc Guire <der.herr@xxxxxxx>:
>
> >
> > One case in /drivers/mmc/host/sdhci-cadence.c has a timeout_us of 1 and
> > sleep_us of 0 which might actually be a bug as it means that the poll loop
> > would do a single read OP in many cases (this is non-atomic context) so it
> > would have to be robust for a single read, thus the poll_timeout might not
> > be suitable - not sure though - but thats a different problem.
>
> Sorry, I could not understand why you thought sdhci-cadence.c seemed a bug.
>
> The SDHCI_CDNS_HRS06_TUNE_UP is auto-cleared by the hardware.
> This will normally be done really soon (within 1us),
> but it may not be cleared if the hardware is in trouble.
> It is the intention of the code:
>
> readl_poll_timeout(reg, tmp, !(tmp & SDHCI_CDNS_HRS06_TUNE_UP),
> 0, 1);
>

Im not sure my selfe but as this would need to be robust for a single read try
to be reliable I did not see the point in using a pool loop of 1 microsecond
on x86 I tried this and timeout_us of 1 will resulted in a single
read in more than 10% on the idle system and in almost 100% doing
a single read on loaded systems. So if the poll loop was introduced with
the assumption that just reading once imediately could miss the ritical
event with resonable probability then timeout_us==1 does not change that.


>
>
>
> > which results in min,max values for the initial iterations of:
> > min max
> > 0 1
> > 0 2
> > 1 4
> > 2 8
> > 4 16
> > 8 32
> > 16 64
> > 32 128
> > ...
>
>
> I notice you were sending this to me, but
> please notice I am not responsible for this file
> (include/linux/iopoll.h) in any way.

I included you as your it showed up with:
scripts/get_maintainer.pl -f include/linux/iopoll.h

>
> Please take my comments for grain of salt:
>
> With this patch, the sleep range is doubled in each iteration.
> Let's say this routine is called with delay_us=1, timeout_us=50,
> then it slept 16 us, then 32 us.
> If it sleeps 64 us in the next iteration,
> it ends up with sleeping 112 us (=16 + 32 + 64) in total
> where we know waiting 50 us is enough.

that is right - but the assumption of poll_timeout is that the
timeout case would actually be rare and if you look at the actual
timings that one has of poll_timout routines (that are based on
usleep_range()) on loaded systems they overrun by 100s of microseconds
very frequnently.

But you are right that in the 50us case this is not ideal - the
focus I had was on the very long timeouts that in some cases were set
to up to 5 seconds with tight-loops (or close to timght-loops)

> So, the sleep range granularity may get bigger than
> users' intention.
>
> Probably, waiting too long is not a problem in most cases.
> If so, what is the meaning of the argument "sleep_us" after all?

The problem really is that on idle ssytems the sleep_us argument
works ok and the overruns are not that dramatic - but on loaded
systems the sleep_us routlinely overruns significantly

usleep_range() 5000 samples - idle system
100,200 200,200 190,200
Min. :188481 Min. :201917 Min. :197793
1st Qu.:207062 1st Qu.:207057 1st Qu.:207051
Median :207139 Median :207133 Median :207133
Mean :207254 Mean :207233 Mean :207244
3rd Qu.:207341 erd Qu.:207262 3rd Qu.:207610
Max. :225340 Max. :214222 Max. :214885

CONFIG_PREEMPT_VOLUNTARY=y
usleep_range() 5000 samples - load ~ 8
100,200 190,200 200,200
Min. : 107812 Min. : 203307 Min. : 203432
1st Qu.: 558221 1st Qu.: 557490 1st Qu.: 510356
Median :1123425 Median : 1121939 Median : 1123316
Mean :1103718 Mean : 1100965 Mean : 1100542
3rd Qu.:1541986 3rd Qu.: 1531478 3rd Qu.: 1517414
Max. :8979183 Max. :13765789 Max. :12476136

CONFIG_PREEMPT=y
usleep_range() 5000 samples - load ~ 8
100,200 190,200 200,200
Min. : 115321 Min. : 203963 Min. : 203864
1st Qu.: 510296 1st Qu.: 451479 1st Qu.: 548131
Median : 1148660 Median : 1062576 Median : 1145228
Mean : 1193449 Mean : 1079379 Mean : 1154728
3rd Qu.: 1601552 3rd Qu.: 1378622 3rd Qu.: 1570742
Max. :12936192 Max. :12346313 Max. :13858732

so really small sleep_us make no sense I think - setting it to 0
tight-loop might be justified for small timeout_us (say 10us)
but long busy-wait loops are bad (and probably technically not
that sensible ither). If a busy-wait loop does not get the data/state
it wants within a few loops then busy-waiting is nonsentical and
that is why the intent of the exponential back-off solution does
a few tight-loops and then switches to sleeping delays.

It might be necessary to set it to a more fine grain stepping than
the brute-force *2 - but the principle I think would be better
than what is being done now.

and thanks for your comments !

thx!
hofrat