Re: [RFC] [PATCH] timer: Added usleep[_range][_interruptable] timer

From: Patrick Pannuto
Date: Mon Jun 28 2010 - 14:03:31 EST


>
> You could test residency in specific power states .. Since you want to
> test if your reducing power consumption .. However, I'd replace a ton
> more of these udelay()'s , I don't think you'll get a decent results
> with out that.
>

Ok.

>
> The busy waits your replacing are small in length on average .. If you
> have timers that trigger in small intervals then your not going to
> increase residency in any _lower_ power states .. It's possible that you
> could increase residency in the top level power state, but it seem like
> it would be really marginal .. You need to show that udelay()'s have an
> outside of noise impact on something ..
>

Ok.

>
> I think you need to do some more research on what your actually doing to
> the system. From what your showing us one could make a lot of different
> arguments as to what this change will actually do. You really need some
> sort of test that doesn't leave a lot of room for argument.
>

Well here we go then.

First, I would note that IMO is both telling and remarkable that the
following (almost) worked and allowed the system to boot successfully:

(a little long since you can cut off half of it and just get a list of
affected / candidate files, the following is a glorified sed that
replaces all instances of udelay with usleep for udelay's >= 50us)

git grep 'udelay([[:digit:]]\+)' |
perl -F"[\(\)]" -anl -e 'print if $F[1] >= 50' |
cut -d: -f1 |
uniq |
grep -v staging |
xargs perl -i -wlpe
's/udelay/usleep/ if m/udelay\((\d+)\)/g && $1 >= 50'

The usb driver and one power routine that was in a critical section had
to be reverted; everything else ran successfully.


*** MEASUREMENT TECHNIQUE ***

/proc/msm_pm_stats will dump information on residency in power states as
the system is running; dumps were taken 15s, 30s, 60s, 120s, 180s, and
240s from boot. 10 trials were performed for each build. A trial was run
replacing udelay >1000, >500, >100, >=100, >=50.

origin: 0 changes (0 total)
over1000: 106 changes (106 total)
over500: 193 changes (299 total)
over100: 302 changes (601 total)
equal100: 423 changes (1024 total)
equal50: 220 changes (1244 total)

Now, a concern is obviously that these changes hit drivers that are not
used in this system, or that many will be hit only once in boot /
initialization / etc; this shows plainly in the results and will be
addressed.


*** RESULTS ***

Complete results are attached (please forgive the ugly stats.py, it
evolved; it's functional for the task at hand, but certainly not pretty).

--> Long results also at end of email

>From the final results, after 240s:

TIME STATE #TIMES CALLED TIME IN STATE | DELTA FROM ORG

////////////////////////////////////////////////////////////////////////////////
=== origin (10 samples) =================================== RUNTIME: 241.535 ===
240 idle-spin -- 2.3 1.45001e-05
240 not-idle -- 88368.2 67.8311703321
240 idle-request -- 88367.2 23337.6322655
240 idle-wfi -- 88364.9 172.296383696
=== over1000 (10 samples) ====== (delta: -0.00999999998604) RUNTIME: 241.525 ===
240 idle-spin -- 3.1 1.91666e-05 | 4.6665e-06
240 not-idle -- 88723.9 65.6361809172 | -2.1949894149
240 idle-request -- 88722.9 23311.9855603 | -25.6467051707
240 idle-wfi -- 88719.8 174.493487111 | 2.1971034149
=== over500 (10 samples) ======== (delta: -0.0339999999851) RUNTIME: 241.501 ===
240 idle-spin -- 2.3 1.88334e-05 | 4.3333e-06
240 not-idle -- 88636.3 67.0242803241 | -0.806890008
240 idle-request -- 88635.3 23280.1632631 | -57.469002442
240 idle-wfi -- 88633.0 173.077055869 | 0.7806721723
=== over100 (10 samples) ======== (delta: -0.0539999999921) RUNTIME: 241.481 ===
240 idle-spin -- 0.9 6.6666e-06 | -7.8335e-06
240 not-idle -- 88599.0 67.190273164 | -0.6408971681
240 idle-request -- 88597.9 23253.4797828 | -84.1524827638
240 idle-wfi -- 88597.0 172.884529866 | 0.5881461694
=== equal100 (10 samples) ================== (delta: -0.025) RUNTIME: 241.51 ===
240 idle-spin -- 1.4 9.5002e-06 | -4.9999e-06
240 not-idle -- 88685.9 66.5067348407 | -1.3244354914
240 idle-request -- 88684.9 23294.4341497 | -43.1981158192
240 idle-wfi -- 88683.5 173.60379269 | 1.3074089936
=== equal50 (10 samples) ======== (delta: -0.0289999999804) RUNTIME: 241.506 ===
240 idle-spin -- 2.0 9.6664e-06 | -4.8337e-06
240 not-idle -- 88537.4 65.8619214556 | -1.9692488765
240 idle-request -- 88536.4 22979.3665406 | -358.265724952
240 idle-wfi -- 88534.4 174.247270576 | 1.9508868794


There appears to be very little change outside of noise from origin through
equal100, however, once equal50 is reached, there is a noticeable change.
Further investigation revealed that before equal100, none of the changes
could be guaranteed to hit the testing target (which, if nothing else,
provides a nice baseline for noise). The equal100 test included 3 changes
that could positively be sure to hit target and the equal50 test another
4 changes. The equal50 test was the only test that seems to conclusively
show immediate benefit, but it *does* show benefit.

(here "positively sure" are arch-specific drivers that changed, easy to
search for; it's likely that more did hit the target along the way)


*** CONCLUSIONS ***

I submit that the concept of usleep introduces no harm and does have the
potential for benefit. I further submit that as processors get faster, the
benefit from a usleep versus a udelay will only improve. Finally, I would note
that the success of the "sed" implies to me that many users of udelay would
prefer the semantics of a usleep call; it is simply unavailable and msleep is
simply too imprecise for callers to use, which is in and of itself a case
for a usleep API.

I invite comments, questions, etc, and am happy to run more tests; however,
I would like to request that if you for any reason believe this data to be
imprecise or insufficient that you please provide me with the *exact* test you
would like me to run and would like to see data from, since it does take a
fair amount of time to run these tests.

Finally, I would like to re-iterate my point that unnecessary busy-waiting,
while it *can* be pre-empted, *is* wasteful and __if it can easily be avoided,
it should be__.

-Pat


*** LONG RESULTS ***

==> Attached is pm_stats.tar.gz, it contains results as TEST_TRIALNO,
flash.sh to run tests (./flash.sh pm_stats TESTNAME) run from an Android
product directory, and stats.py to generate statistics. Forgive stats.py,
it is ugly and an abuse of python.

TIME STATE #TIMES CALLED TIME IN STATE | DELTA FROM ORG

////////////////////////////////////////////////////////////////////////////////
=== origin (10 samples) ==================================== RUNTIME: 15.253 ===
15 idle-spin -- 1.1 4.8332e-06
15 not-idle -- 59265.1 9.3842608833
15 idle-request -- 59262.2 21892.7762788
15 idle-wfi -- 59261.1 4.4754141812
=== over1000 (10 samples) ======== (delta: -0.00299999999115) RUNTIME: 15.25 ===
15 idle-spin -- 0.7 3.6666e-06 | -1.1666e-06
15 not-idle -- 59243.9 9.3891611869 | 0.0049003036
15 idle-request -- 59237.1 21897.8631197 | 5.0868408908
15 idle-wfi -- 59236.4 4.4665700454 | -0.0088441358
=== over500 (10 samples) ======== (delta: -0.00899999999674) RUNTIME: 15.244 ===
15 idle-spin -- 0.5 2.8334e-06 | -1.9998e-06
15 not-idle -- 59267.2 9.3819648583 | -0.002296025
15 idle-request -- 59263.2 21894.5342115 | 1.75793271321
15 idle-wfi -- 59262.7 4.4662633725 | -0.0091508087
=== over100 (10 samples) ======== (delta: -0.00200000000186) RUNTIME: 15.251 ===
15 idle-spin -- 0.1 1.1666e-06 | -3.6666e-06
15 not-idle -- 59222.3 9.4040358354 | 0.0197749521
15 idle-request -- 59213.6 21893.1330305 | 0.356751724303
15 idle-wfi -- 59213.5 4.4462678975 | -0.0291462837
=== equal100 (10 samples) ======= (delta: -0.00599999999395) RUNTIME: 15.247 ===
15 idle-spin -- 0.1 5e-07 | -4.3332e-06
15 not-idle -- 59311.6 9.392722999 | 0.0084621157
15 idle-request -- 59306.0 21900.290705 | 7.5144261487
15 idle-wfi -- 59305.9 4.4606828991 | -0.0147312821
=== equal50 (10 samples) ====================== (delta: 0.0) RUNTIME: 15.253 ===
15 idle-spin -- 0.6 1.8333e-06 | -2.9999e-06
15 not-idle -- 59267.0 9.4182486803 | 0.033987797
15 idle-request -- 59261.4 21613.5706545 | -279.205624353
15 idle-wfi -- 59260.8 4.4418992194 | -0.0335149618
////////////////////////////////////////////////////////////////////////////////
=== origin (10 samples) ==================================== RUNTIME: 30.494 ===
30 idle-spin -- 1.2 5.1666e-06
30 not-idle -- 69462.2 21.80567204
30 idle-request -- 69462.2 22568.1921778
30 idle-wfi -- 69461.0 6.8173021886
=== over1000 (10 samples) ======= (delta: -0.00100000001257) RUNTIME: 30.493 ===
30 idle-spin -- 1.1 6.3332e-06 | 1.1666e-06
30 not-idle -- 69436.0 21.7969753263 | -0.0086967137
30 idle-request -- 69436.0 22564.1587592 | -4.0334186002
30 idle-wfi -- 69434.9 6.8029330699 | -0.0143691187
=== over500 (10 samples) ======== (delta: -0.00700000001816) RUNTIME: 30.487 ===
30 idle-spin -- 0.6 3.1667e-06 | -1.9999e-06
30 not-idle -- 69474.6 21.7808043443 | -0.0248676957
30 idle-request -- 69474.6 22564.0631218 | -4.1290559666
30 idle-wfi -- 69474.0 6.7958443852 | -0.0214578034
=== over100 (10 samples) ========= (delta: 0.00099999998929) RUNTIME: 30.495 ===
30 idle-spin -- 0.2 1.6666e-06 | -3.5e-06
30 not-idle -- 69379.1 21.8042313156 | -0.0014407244
30 idle-request -- 69379.1 22551.153217 | -17.0389607525
30 idle-wfi -- 69378.9 6.7590274145 | -0.0582747741
=== equal100 (10 samples) ======= (delta: -0.00300000001443) RUNTIME: 30.491 ===
30 idle-spin -- 0.2 1e-06 | -4.1666e-06
30 not-idle -- 69455.7 21.8058526573 | 0.0001806173
30 idle-request -- 69455.7 22570.70418 | 2.51200219339
30 idle-wfi -- 69455.5 6.7603640729 | -0.0569381157
=== equal50 (10 samples) ======== (delta: -0.00100000002421) RUNTIME: 30.493 ===
30 idle-spin -- 0.9 3.8332e-06 | -1.3334e-06
30 not-idle -- 69449.7 21.8319953111 | 0.0263232711
30 idle-request -- 69449.7 22271.8053346 | -296.386843177
30 idle-wfi -- 69448.8 6.7478177546 | -0.069484434
////////////////////////////////////////////////////////////////////////////////
=== origin (10 samples) ===================================== RUNTIME: 60.78 ===
60 idle-spin -- 1.9 1.16667e-05
60 not-idle -- 79366.8 49.3036608463
60 idle-request -- 79366.3 22907.1705373
60 idle-wfi -- 79364.4 9.0110567161
=== over1000 (10 samples) ========= (delta: 0.0139999999898) RUNTIME: 60.794 ===
60 idle-spin -- 2.0 1.36668e-05 | 2.0001e-06
60 not-idle -- 79493.9 49.8914426181 | 0.5877817718
60 idle-request -- 79491.4 22894.3020036 | -12.8685336944
60 idle-wfi -- 79489.4 9.0156314459 | 0.0045747298
=== over500 (10 samples) ========== (delta: -0.010999999987) RUNTIME: 60.769 ===
60 idle-spin -- 1.5 1.14999e-05 | -1.668e-07
60 not-idle -- 79513.5 49.7980203079 | 0.4943594616
60 idle-request -- 79512.6 22869.995713 | -37.1748243636
60 idle-wfi -- 79511.1 9.0027399223 | -0.0083167938
=== over100 (10 samples) ========= (delta: -0.0229999999865) RUNTIME: 60.757 ===
60 idle-spin -- 0.8 6.3332e-06 | -5.3335e-06
60 not-idle -- 79399.4 49.8599212971 | 0.5562604508
60 idle-request -- 79396.8 22849.5517755 | -57.6187617949
60 idle-wfi -- 79396.0 8.9517647664 | -0.0592919497
=== equal100 (10 samples) ======= (delta: -0.00699999999488) RUNTIME: 60.773 ===
60 idle-spin -- 1.0 6.1667e-06 | -5.5e-06
60 not-idle -- 79508.7 49.7887594666 | 0.4850986203
60 idle-request -- 79506.3 22878.2886511 | -28.8818862207
60 idle-wfi -- 79505.3 8.9810057637 | -0.0300509524
=== equal50 (10 samples) ======== (delta: -0.00799999999581) RUNTIME: 60.772 ===
60 idle-spin -- 1.4 5.9998e-06 | -5.6669e-06
60 not-idle -- 79401.8 49.7245356251 | 0.4208747788
60 idle-request -- 79401.8 22589.3334706 | -317.837066708
60 idle-wfi -- 79400.4 8.9224572725 | -0.0885994436
////////////////////////////////////////////////////////////////////////////////
=== origin (10 samples) =================================== RUNTIME: 121.018 ===
120 idle-spin -- 2.3 1.45001e-05
120 not-idle -- 84534.3 65.4829273526
120 idle-request -- 84533.2 23073.5171825
120 idle-wfi -- 84530.9 54.1388846978
=== over1000 (10 samples) ======== (delta: 0.0149999999907) RUNTIME: 121.033 ===
120 idle-spin -- 2.9 1.76667e-05 | 3.1666e-06
120 not-idle -- 84899.7 62.9539904388 | -2.5289369138
120 idle-request -- 84898.6 23058.6542567 | -14.8629258454
120 idle-wfi -- 84895.7 56.6840665995 | 2.5451819017
=== over500 (10 samples) ======= (delta: -0.00999999999767) RUNTIME: 121.008 ===
120 idle-spin -- 2.1 1.61667e-05 | 1.6666e-06
120 not-idle -- 84810.4 64.5704918038 | -0.9124355488
120 idle-request -- 84809.3 23020.7240229 | -52.7931596084
120 idle-wfi -- 84807.2 55.0429809068 | 0.904096209
=== over100 (10 samples) ======== (delta: -0.0209999999963) RUNTIME: 120.997 ===
120 idle-spin -- 0.9 6.6666e-06 | -7.8335e-06
120 not-idle -- 84773.7 64.6287104755 | -0.8542168771
120 idle-request -- 84772.7 23002.4330668 | -71.0841157664
120 idle-wfi -- 84771.8 54.9635557389 | 0.8246710411
=== equal100 (10 samples) ======= (delta: 0.00600000000559) RUNTIME: 121.024 ===
120 idle-spin -- 1.4 9.5002e-06 | -4.9999e-06
120 not-idle -- 84843.8 63.8751059888 | -1.6078213638
120 idle-request -- 84842.8 23042.7777274 | -30.7394551461
120 idle-wfi -- 84841.4 55.7466510555 | 1.6077663577
=== equal50 (10 samples) ======= (delta: -0.00099999998929) RUNTIME: 121.017 ===
120 idle-spin -- 2.0 9.6664e-06 | -4.8337e-06
120 not-idle -- 84767.4 63.0424664541 | -2.4404608985
120 idle-request -- 84766.4 22740.2895937 | -333.227588835
120 idle-wfi -- 84764.4 56.5740535877 | 2.4351688899
////////////////////////////////////////////////////////////////////////////////
=== origin (10 samples) =================================== RUNTIME: 181.273 ===
180 idle-spin -- 2.3 1.45001e-05
180 not-idle -- 86458.1 66.7356990125
180 idle-request -- 86457.1 23202.3433604
180 idle-wfi -- 86454.8 113.126771849
=== over1000 (10 samples) ========= (delta: 0.010999999987) RUNTIME: 181.284 ===
180 idle-spin -- 3.0 1.83333e-05 | 3.8332e-06
180 not-idle -- 86831.3 64.5482361081 | -2.1874629044
180 idle-request -- 86830.3 23181.3150713 | -21.028289176
180 idle-wfi -- 86827.3 115.331773087 | 2.2050012376
=== over500 (10 samples) ======== (delta: -0.0250000000117) RUNTIME: 181.248 ===
180 idle-spin -- 2.2 1.68334e-05 | 2.3333e-06
180 not-idle -- 86733.7 65.9390319914 | -0.7966670211
180 idle-request -- 86732.7 23151.5750952 | -50.7682652808
180 idle-wfi -- 86730.5 113.915960535 | 0.7891886856
=== over100 (10 samples) ======== (delta: -0.0339999999968) RUNTIME: 181.239 ===
180 idle-spin -- 0.9 6.6666e-06 | -7.8335e-06
180 not-idle -- 86696.8 66.0923128287 | -0.6433861838
180 idle-request -- 86695.7 23122.6406168 | -79.7027435974
180 idle-wfi -- 86694.8 113.744699867 | 0.6179280181
=== equal100 (10 samples) ======= (delta: -0.0060000000056) RUNTIME: 181.267 ===
180 idle-spin -- 1.4 9.5002e-06 | -4.9999e-06
180 not-idle -- 86765.3 65.4099056755 | -1.325793337
180 idle-request -- 86764.2 23156.7172921 | -45.6260683097
180 idle-wfi -- 86762.8 114.457775355 | 1.3310035056
=== equal50 (10 samples) ======= (delta: -0.00799999998418) RUNTIME: 181.265 ===
180 idle-spin -- 2.0 9.6664e-06 | -4.8337e-06
180 not-idle -- 86665.1 64.770143973 | -1.9655550395
180 idle-request -- 86664.1 22851.7003915 | -350.642968977
180 idle-wfi -- 86662.1 115.098642558 | 1.971870709
////////////////////////////////////////////////////////////////////////////////
=== origin (10 samples) =================================== RUNTIME: 241.535 ===
240 idle-spin -- 2.3 1.45001e-05
240 not-idle -- 88368.2 67.8311703321
240 idle-request -- 88367.2 23337.6322655
240 idle-wfi -- 88364.9 172.296383696
=== over1000 (10 samples) ====== (delta: -0.00999999998604) RUNTIME: 241.525 ===
240 idle-spin -- 3.1 1.91666e-05 | 4.6665e-06
240 not-idle -- 88723.9 65.6361809172 | -2.1949894149
240 idle-request -- 88722.9 23311.9855603 | -25.6467051707
240 idle-wfi -- 88719.8 174.493487111 | 2.1971034149
=== over500 (10 samples) ======== (delta: -0.0339999999851) RUNTIME: 241.501 ===
240 idle-spin -- 2.3 1.88334e-05 | 4.3333e-06
240 not-idle -- 88636.3 67.0242803241 | -0.806890008
240 idle-request -- 88635.3 23280.1632631 | -57.469002442
240 idle-wfi -- 88633.0 173.077055869 | 0.7806721723
=== over100 (10 samples) ======== (delta: -0.0539999999921) RUNTIME: 241.481 ===
240 idle-spin -- 0.9 6.6666e-06 | -7.8335e-06
240 not-idle -- 88599.0 67.190273164 | -0.6408971681
240 idle-request -- 88597.9 23253.4797828 | -84.1524827638
240 idle-wfi -- 88597.0 172.884529866 | 0.5881461694
=== equal100 (10 samples) ================== (delta: -0.025) RUNTIME: 241.51 ===
240 idle-spin -- 1.4 9.5002e-06 | -4.9999e-06
240 not-idle -- 88685.9 66.5067348407 | -1.3244354914
240 idle-request -- 88684.9 23294.4341497 | -43.1981158192
240 idle-wfi -- 88683.5 173.60379269 | 1.3074089936
=== equal50 (10 samples) ======== (delta: -0.0289999999804) RUNTIME: 241.506 ===
240 idle-spin -- 2.0 9.6664e-06 | -4.8337e-06
240 not-idle -- 88537.4 65.8619214556 | -1.9692488765
240 idle-request -- 88536.4 22979.3665406 | -358.265724952
240 idle-wfi -- 88534.4 174.247270576 | 1.9508868794

Attachment: pm_stats.tar.gz
Description: GNU Zip compressed data