Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq

From: Andreas Herrmann
Date: Wed Jul 18 2018 - 11:26:11 EST


I think I still owe some performance numbers to show what is wrong
with systems using pcc-cpufreq with Linux after commit 554c8aa8ecad.

Following are results for kernbench tests (from MMTests test suite).
That's just a kernel compile with different number of compile jobs.
As result the time is measured, 5 runs are done for each configuration and
average values calculated.

I've restricted maximum number of jobs to 30. That means that tests
were done for 2, 4, 8, 16, and 30 compile jobs. I had bound all tests
to node 0. (I've used something like "numactl -N 0 ./run-mmtests.sh
--run-monitor <test_name>" to start those tests.)

Tests were done with kernel 4.18.0-rc3 on an HP DL580 Gen8 with Intel
Xeon CPU E7-4890 with latest BIOS installed. System had 4 nodes, 15
CPUs per node (30 logical CPUs with HT enabled). pcc-cpufreq was
active and ondemand governor in use.

I've tested with different number of online CPUs which better
illustrates how idle online CPUs interfere with compile load on node 0
(due to the jitter caused by pcc-cpufreq and its locking).

Average mean for user/system/elapsed time and standard deviation for each
subtest (=number of compile jobs) are as follows:

(Nodes) N0 N01 N0 N01 N0123
(CPUs) 15CPUs 30CPUs 30CPUs 60CPUs 120CPUs
Amean user-2 640.82 (0.00%) 675.90 (-5.47%) 789.03 (-23.13%) 1448.58 (-126.05%) 3575.79 (-458.01%)
Amean user-4 652.18 (0.00%) 689.12 (-5.67%) 868.19 (-33.12%) 1846.66 (-183.15%) 5437.37 (-733.73%)
Amean user-8 695.00 (0.00%) 732.22 (-5.35%) 1138.30 (-63.78%) 2598.74 (-273.92%) 7413.43 (-966.67%)
Amean user-16 653.94 (0.00%) 772.48 (-18.13%) 1734.80 (-165.29%) 2699.65 (-312.83%) 9224.47 (-1310.61%)
Amean user-30 634.91 (0.00%) 701.11 (-10.43%) 1197.37 (-88.59%) 1360.02 (-114.21%) 3732.34 (-487.85%)
Amean syst-2 235.45 (0.00%) 235.68 (-0.10%) 321.99 (-36.76%) 574.44 (-143.98%) 869.35 (-269.23%)
Amean syst-4 239.34 (0.00%) 243.09 (-1.57%) 345.07 (-44.18%) 621.00 (-159.47%) 1145.13 (-378.46%)
Amean syst-8 246.51 (0.00%) 254.83 (-3.37%) 387.49 (-57.19%) 786.63 (-219.10%) 1406.17 (-470.42%)
Amean syst-16 110.85 (0.00%) 122.21 (-10.25%) 408.25 (-268.31%) 644.41 (-481.36%) 1513.04 (-1264.99%)
Amean syst-30 82.74 (0.00%) 94.07 (-13.69%) 155.38 (-87.80%) 207.03 (-150.22%) 547.73 (-562.01%)
Amean elsp-2 625.33 (0.00%) 724.51 (-15.86%) 792.47 (-26.73%) 1537.44 (-145.86%) 3510.22 (-461.34%)
Amean elsp-4 482.02 (0.00%) 568.26 (-17.89%) 670.26 (-39.05%) 1257.34 (-160.85%) 3120.89 (-547.46%)
Amean elsp-8 267.75 (0.00%) 337.88 (-26.19%) 430.56 (-60.80%) 978.47 (-265.44%) 2321.91 (-767.18%)
Amean elsp-16 63.55 (0.00%) 71.79 (-12.97%) 224.83 (-253.79%) 403.94 (-535.65%) 1121.04 (-1664.09%)
Amean elsp-30 56.76 (0.00%) 62.82 (-10.69%) 66.50 (-17.16%) 124.20 (-118.84%) 303.47 (-434.70%)
Stddev user-2 1.36 (0.00%) 1.94 (-42.57%) 16.17 (-1090.46%) 119.09 (-8669.75%) 382.74 (-28085.60%)
Stddev user-4 2.81 (0.00%) 5.08 (-80.78%) 4.88 (-73.66%) 252.56 (-8881.80%) 1133.02 (-40193.16%)
Stddev user-8 2.30 (0.00%) 15.58 (-578.28%) 30.60 (-1232.63%) 279.35 (-12064.01%) 1050.00 (-45621.61%)
Stddev user-16 6.76 (0.00%) 25.52 (-277.80%) 78.44 (-1060.97%) 118.29 (-1650.94%) 724.11 (-10617.95%)
Stddev user-30 0.51 (0.00%) 1.80 (-249.13%) 12.63 (-2354.11%) 25.82 (-4915.43%) 1098.82 (-213365.28%)
Stddev syst-2 1.52 (0.00%) 2.76 (-81.04%) 3.98 (-161.58%) 36.35 (-2287.16%) 59.09 (-3781.09%)
Stddev syst-4 2.39 (0.00%) 1.55 (35.25%) 3.24 ( -35.92%) 51.51 (-2057.65%) 175.75 (-7262.43%)
Stddev syst-8 1.08 (0.00%) 3.70 (-241.40%) 6.83 (-531.33%) 65.80 (-5977.97%) 151.17 (-13864.10%)
Stddev syst-16 3.78 (0.00%) 5.58 (-47.53%) 4.63 ( -22.44%) 47.90 (-1167.18%) 99.94 (-2543.88%)
Stddev syst-30 0.31 (0.00%) 0.38 (-22.41%) 3.01 (-862.79%) 27.45 (-8688.85%) 137.94 (-44072.77%)
Stddev elsp-2 55.14 (0.00%) 55.04 (0.18%) 95.33 ( -72.90%) 103.91 (-88.45%) 302.31 (-448.29%)
Stddev elsp-4 60.90 (0.00%) 84.42 (-38.62%) 18.92 ( 68.94%) 197.60 (-224.46%) 323.53 (-431.24%)
Stddev elsp-8 16.77 (0.00%) 30.77 (-83.47%) 49.57 (-195.57%) 79.02 (-371.16%) 261.85 (-1461.28%)
Stddev elsp-16 1.99 (0.00%) 2.88 (-44.60%) 28.11 (-1311.79%) 101.81 (-5012.88%) 62.29 (-3028.36%)
Stddev elsp-30 0.65 (0.00%) 1.04 (-59.06%) 1.64 (-151.81%) 41.84 (-6308.81%) 75.37 (-11445.61%)

Overall test time for each mmtests invocation was as follows (this is
also given for number-of-cpu configs for which I did not provide
details above).

N0 N01 N0 N012 N0123 N01 N0123 N0123 N012 N0123 N0123
15CPUs 30CPUs 30CPUs 45CPUs 60CPUs 60CPUs 75CPUs 90CPUs 90CPUs 105CPUs 120CPUs
User 17196.67 18714.36 30105.65 19239.27 19505.35 53089.39 22690.33 26731.06 38131.74 47627.61 153424.99
System 4807.98 4970.89 8533.95 5136.97 5184.24 16351.67 6135.29 7152.66 10920.76 12362.39 32129.74
Elapsed 7796.46 9166.55 11518.51 9274.77 9030.39 25465.38 9361.60 10677.63 15633.49 18900.46 60908.28

The results given for 120 online CPUs on nodes 0-3 indicate what I
meant with the "system being almost unusable". When trying to gather
results with kernel 4.17.5 and 120 CPUs, one iteration of kernbench (1
kernel compile) with 2 jobs even took about 6 hours. Maybe it was an
extreme outlier but I dismissed to further use that kernel (w/o
modifications) for further tests.


Andreas