[RFC] [PATCH] timer: Added usleep[_range][_interruptable] timer

From: Patrick Pannuto
Date: Wed Jun 23 2010 - 15:22:27 EST


*** INTRO ***

As discussed here ( http://lkml.org/lkml/2007/8/3/250 ), msleep(1) is not
precise enough for many drivers (yes, sleep precision is an unfair notion,
but consistently sleeping for ~an order of magnitude greater than requested
is worth fixing). This patch adds a usleep API so that udelay does not have
to be used. Obviously not every udelay can be replaced (those in atomic
contexts or being used for simple bitbanging come to mind), but there are
many, many examples of

mydriver_write(...)
/* Wait for hardware to latch */
udelay(100)

in various drivers where a busy-wait loop is neither beneficial nor
necessary, but msleep simply does not provide enough precision and people
are using a busy-wait loop instead.


*** SOME QUANTIFIABLE (?) NUMBERS ***

My focus is on Android, so I started by replacing the udelays in
drivers/i2c/busses/i2c-msm.c:

267: udelay(100) --> usleep_range(100, 200)
283: udelay(100) --> usleep_range(100, 200)
333: udelay(20) --> usleep(20)

and measured wakeups after Android was completely booted and stable
across 100 trials (throwing away the first) like so:

for i in {1..100}; do
echo "=== Trial $i" >> test.txt;
echo 1 > /proc/timer_stats; sleep 10; echo 0 > /proc/timer_stats;
cat /proc/timer_stats >> test.txt;
sleep 2s;
done

then averaged the results to see if there was any benefit:

=== ORIGINAL (99 samples) ========================================= ORIGINAL ===
Avg: 188.760000 wakeups in 9.911010 secs (19.045486 wkups/sec) [18876 total]
Wakeups: Min - 179, Max - 208, Mean - 190.666667, Stdev - 6.601194

=== USLEEP (99 samples) ============================================= USLEEP ===
Avg: 188.200000 wakeups in 9.911230 secs (18.988561 wkups/sec) [18820 total]
Wakeups: Min - 181, Max - 213, Mean - 190.101010, Stdev - 6.950757

While not particularly rigorous, the results seem to indicate that there may be
some benefit from pursuing this.


*** HOW MUCH BENEFIT? ***

Somewhat arbitrarily choosing 100 as a cut-off for udelay VS usleep:

git grep 'udelay([[:digit:]]\+)' |
perl -F"[\(\)]" -anl -e 'print if $F[1] >= 100' | wc -l

yeilds 1093 on Linus's tree. There are 313 instances of >= 1000 and still
another 53 >= 10000us of busy wait! (If AVOID_POPS is configured in, the
es18xx driver will udelay(100000) or *0.1 seconds of busy wait*)


*** SUMMARY ***

I believe the usleep functions provide a tangible benefit, but would like
some input before I go for a more thorough udelay removal. Also, at what
point is a reasonable cutoff between udelay and usleep? I found two dated
(2007) papers discussing the overhead of a context switch:

http://www.cs.rochester.edu/u/cli/research/switch.pdf
IBM eServer, dual 2.0GHz Pentium Xeon; 512 KB L2, cache line 128B
Linux 2.6.17, RHEL 9, gcc 3.2.2 (-O0)
3.8 us / context switch

http://delivery.acm.org/10.1145/1290000/1281703/a3-david.pdf
ARMv5, ARM926EJ-S on an OMAP1610 (set to 120MHz clock)
Linux 2.6.20-rc5-omap1
48 us / context switch

However, there is more to consider than just context switching; is there
anyone who knows an appropriate cut-off, or an appropriate way to measure
and find one?


Finally, to address any potential questions of why this isn't built on
top of do_nanosleep, the function usleep_range seems very valuable for
power applications; many of the delays are simply waiting for something
to complete, thus I would prefer if they did not themselves instigate
a wake-up; also, do_nanosleep seems like it is built to be an interface
for the user-space nanosleep function - it did not seem like a good fit.

-Pat