Re: [PATCH] doc: add note on usleep_range range

From: Nicholas Mc Guire
Date: Sat Jan 07 2017 - 14:43:04 EST


On Tue, Dec 27, 2016 at 10:56:26PM +0100, Pavel Machek wrote:
> On Tue 2016-12-13 04:58:43, Nicholas Mc Guire wrote:
> > useleep_range() with a delta of 0 makes no sense and only prevents the
> > timer subsystem from optimizing interrupts. As any user of usleep_range()
> > is in non-atomic context the timer jitter is in the range of 10s of
> > microseconds anyway.
> >
> > This adds a note making it clear that a range of 0 is a bad idea.
> >
> > Signed-off-by: Nicholas Mc Guire <hofrat@xxxxxxxxx>
> > ---
> >
> > as of 4.9.0 there are about 20 cases of usleep_ranges() that have
> > min==max and none of them really look like they are necessary, so
> > it does seem like a relatively common misunderstanding worth
> > noting in the documentation.
> >
> > Patch is against 4.9.0 (localversion-next is 20161212)
> >
> > Documentation/timers/timers-howto.txt | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/Documentation/timers/timers-howto.txt b/Documentation/timers/timers-howto.txt
> > index 038f8c7..b5cdf82 100644
> > --- a/Documentation/timers/timers-howto.txt
> > +++ b/Documentation/timers/timers-howto.txt
> > @@ -93,6 +93,13 @@ NON-ATOMIC CONTEXT:
> > tolerances here are very situation specific, thus it
> > is left to the caller to determine a reasonable range.
> >
> > + A range of 0, that is usleep_range(100,100) or the
> > + like, do not make sense as this code is in a
> > + non-atomic section and a system can not be expected
> > + to have jitter 0. For any non-RT code any delta
>
> Would it be possible to fix english here?

Aggreed that is crappy language - my bad - will fix - thanks!

>
> "to have zero jitter" at least. I believe it is "does not".
>
> I don't see how atomic vs. non-atomic context makes difference. There
> are sources of jitter that affect atomic context...

The relevance is that while there is jitter in atomic context it can
be quite small (depending on your hardware and the specifics of system
config) but in non-atomic context the jitter is so large that it
makes no relevant difference if you give usleep_range slack of a few
microseconds.

>
> > + less than 50 microseconds probably is only preventing
> > + timer subsystem optimization but providing no benefit.
>
> And I don't trust you here. _If_ it prevents timer optimalization,
> _then_ it provides benefit, at least in the average case.
>
here is the data:

System: Intel Core i7 CPU 920 @ 2.67GHz Ocotocore
OS: Debian 8.1 (but thats quite irrelevant)
Kernel: 4.10-rc2 (localversion-next next-20170106)
config: x86_64_defconfig (Voluntary | Preempt)

Test-setup - poped this into akernel module and just
brute force load/unload it in a loop - not very elegant
but it does the job.

static int __init usleep_test_init(void)
{
ktime_t now,last;
unsigned long min,max;
min = 200;
max = 250;
last = ktime_get();
usleep_range(min, max);
now = ktime_get();
printk("%llu\n", ktime_to_ns(now)-ktime_to_ns(last));
return 0;
}

Results:

usleep_range() 5000 samples - idle system
100,100 200,200 190,200
Min. :188481 Min. :201917 Min. :197793
1st Qu.:207062 1st Qu.:207057 1st Qu.:207051
Median :207139 Median :207133 Median :207133
Mean :207254 Mean :207233 Mean :207244
3rd Qu.:207341 erd Qu.:207262 3rd Qu.:207610
Max. :225340 Max. :214222 Max. :214885

100,200 to 200,200 is maybe relevant impact for
some systems with respect to the outliers, but
mean and median are almost the same, for
190,200 to 200,200 there is statistically no
significant difference with respect to performance
Note that the timestamp before and after also has
jitter - so only part of the jitter can be attributed
to usleep_range() it self. But idle system optimization
is not that interesting for most systems.

On a loaded box:
Load here means that 8 find / | grep bla loops were started
(not pinned to any particular core) and then the
usleep_range() test ran 5000 times.

Same setup as above but this time we differenciae between
PREEMTI and PREEMPT_VOLUNTARY

CONFIG_PREEMPT_VOLUNTARY=y
usleep_range() 5000 samples - load ~ 8
100,200 190,200 200,200
Min. : 107812 Min. : 203307 Min. : 203432
1st Qu.: 558221 1st Qu.: 557490 1st Qu.: 510356
Median :1123425 Median : 1121939 Median : 1123316
Mean :1103718 Mean : 1100965 Mean : 1100542
3rd Qu.:1541986 3rd Qu.: 1531478 3rd Qu.: 1517414
Max. :8979183 Max. :13765789 Max. :12476136

CONFIG_PREEMPT=y
usleep_range() 5000 samples - load ~ 8
100,200 190,200 200,200
Min. : 115321 Min. : 203963 Min. : 203864
1st Qu.: 510296 1st Qu.: 451479 1st Qu.: 548131
Median : 1148660 Median : 1062576 Median : 1145228
Mean : 1193449 Mean : 1079379 Mean : 1154728
3rd Qu.: 1601552 3rd Qu.: 1378622 3rd Qu.: 1570742
Max. :12936192 Max. :12346313 Max. :13858732

So for a loaded system it simply makes no relevant difference
if you grant the subsystem 10 microseconds range or not. In
fact if one wanted 200 us and would allow fow 200,250 it would
be quite hard to notice the difference.

usleep_range(200,200) vs usleep_range(200,250)
Min. : 203864 Min. : 214003
1st Qu.: 548131 1st Qu.: 520436
Median : 1145228 Median : 1138698
Mean : 1154728 Mean : 1201871
3rd Qu.: 1570742 3rd Qu.: 1581952
Max. :13858732 Max. :12491198

I would call the difference insignificant - ploted as curves you
can hardly tell the distribution appart. As soon as you are looking
at more than a single tasks to optimize the difference would
probably completely disapear.

thx!
hofrat