Re: [RFC PATCH v2 00/17] Core scheduling v2

From: Li, Aubrey
Date: Sun Apr 28 2019 - 22:17:08 EST

Next message: Jason Wang: "[PATCH] tuntap: synchronize through tfiles instead of numqueues"
Previous message: Lu Baolu: "[PATCH v3 4/8] iommu/vt-d: Enable DMA remapping after rmrr mapped"
In reply to: Ingo Molnar: "Re: [RFC PATCH v2 00/17] Core scheduling v2"
Next in thread: Ingo Molnar: "Re: [RFC PATCH v2 00/17] Core scheduling v2"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2019/4/28 20:17, Ingo Molnar wrote:
>
> * Aubrey Li <aubrey.intel@xxxxxxxxx> wrote:
>
>> On Sun, Apr 28, 2019 at 5:33 PM Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>>> So because I'm a big fan of presenting data in a readable fashion, here
>>> are your results, tabulated:
>>
>> I thought I tried my best to make it readable, but this one looks much better,
>> thanks, ;-)
>>>
>>> #
>>> # Sysbench throughput comparison of 3 different kernels at different
>>> # load levels, higher numbers are better:
>>> #
>>>
>>> .--------------------------------------|----------------------------------------------------------------.
>>> | NA/AVX vanilla-SMT [stddev%] |coresched-SMT [stddev%] +/- | no-SMT [stddev%] +/- |
>>> |--------------------------------------|----------------------------------------------------------------|
>>> | 1/1 508.5 [ 0.2% ] | 504.7 [ 1.1% ] 0.8% | 509.0 [ 0.2% ] 0.1% |
>>> | 2/2 1000.2 [ 1.4% ] | 1004.1 [ 1.6% ] 0.4% | 997.6 [ 1.2% ] 0.3% |
>>> | 4/4 1912.1 [ 1.0% ] | 1904.2 [ 1.1% ] 0.4% | 1914.9 [ 1.3% ] 0.1% |
>>> | 8/8 3753.5 [ 0.3% ] | 3748.2 [ 0.3% ] 0.1% | 3751.3 [ 0.4% ] 0.1% |
>>> | 16/16 7139.3 [ 2.4% ] | 7137.9 [ 1.8% ] 0.0% | 7049.2 [ 2.4% ] 1.3% |
>>> | 32/32 10899.0 [ 4.2% ] | 10780.3 [ 4.4% ] -1.1% | 10339.2 [ 9.6% ] -5.1% |
>>> | 64/64 15086.1 [ 11.5% ] | 14262.0 [ 8.2% ] -5.5% | 11168.7 [ 22.2% ] -26.0% |
>>> | 128/128 15371.9 [ 22.0% ] | 14675.8 [ 14.4% ] -4.5% | 10963.9 [ 18.5% ] -28.7% |
>>> | 256/256 15990.8 [ 22.0% ] | 12227.9 [ 10.3% ] -23.5% | 10469.9 [ 19.6% ] -34.5% |
>>> '--------------------------------------|----------------------------------------------------------------'
>>>
>>> One major thing that sticks out is that if we compare the stddev numbers
>>> to the +/- comparisons then it's pretty clear that the benchmarks are
>>> very noisy: in all but the last row stddev is actually higher than the
>>> measured effect.
>>>
>>> So what does 'stddev' mean here, exactly? The stddev of multipe runs,
>>> i.e. measured run-to-run variance? Or is it some internal metric of the
>>> benchmark?
>>>
>>
>> The benchmark periodically reports intermediate statistics in one second,
>> the raw log looks like below:
>> [ 11s ] thds: 256 eps: 14346.72 lat (ms,95%): 44.17
>> [ 12s ] thds: 256 eps: 14328.45 lat (ms,95%): 44.17
>> [ 13s ] thds: 256 eps: 13773.06 lat (ms,95%): 43.39
>> [ 14s ] thds: 256 eps: 13752.31 lat (ms,95%): 43.39
>> [ 15s ] thds: 256 eps: 15362.79 lat (ms,95%): 43.39
>> [ 16s ] thds: 256 eps: 26580.65 lat (ms,95%): 35.59
>> [ 17s ] thds: 256 eps: 15011.78 lat (ms,95%): 36.89
>> [ 18s ] thds: 256 eps: 15025.78 lat (ms,95%): 39.65
>> [ 19s ] thds: 256 eps: 15350.87 lat (ms,95%): 39.65
>> [ 20s ] thds: 256 eps: 15491.70 lat (ms,95%): 36.89
>>
>> I have a python script to parse eps(events per second) and lat(latency)
>> out, and compute the average and stddev. (And I can draw a curve locally).
>>
>> It's noisy indeed when tasks number is greater than the CPU number.
>> It's probably caused by high frequent load balance and context switch.
>
> Ok, so it's basically an internal workload noise metric, it doesn't
> represent the run-to-run noise.
>
> So it's the real stddev of the workload - but we don't know whether the
> measured performance figure is exactly in the middle of the runtime
> probability distribution.
>
>> Do you have any suggestions? Or any other information I can provide?
>
> Yeah, so we don't just want to know the "standard deviation" of the
> measured throughput values, but also the "standard error of the mean".
>
> I suspect it's pretty low, below 1% for all rows?

Hope my this mail box works for this...

.-------------------------------------------------------------------------------------------------------------.
|NA/AVX vanilla-SMT [std% / sem%] | coresched-SMT [std% / sem%] +/- | no-SMT [std% / sem%] +/- |
|-------------------------------------------------------------------------------------------------------------|
| 1/1 508.5 [ 0.2%/ 0.0%] | 504.7 [ 1.1%/ 0.1%] -0.8%| 509.0 [ 0.2%/ 0.0%] 0.1% |
| 2/2 1000.2 [ 1.4%/ 0.1%] | 1004.1 [ 1.6%/ 0.2%] 0.4%| 997.6 [ 1.2%/ 0.1%] -0.3% |
| 4/4 1912.1 [ 1.0%/ 0.1%] | 1904.2 [ 1.1%/ 0.1%] -0.4%| 1914.9 [ 1.3%/ 0.1%] 0.1% |
| 8/8 3753.5 [ 0.3%/ 0.0%] | 3748.2 [ 0.3%/ 0.0%] -0.1%| 3751.3 [ 0.4%/ 0.0%] -0.1% |
| 16/16 7139.3 [ 2.4%/ 0.2%] | 7137.9 [ 1.8%/ 0.2%] -0.0%| 7049.2 [ 2.4%/ 0.2%] -1.3% |
| 32/32 10899.0 [ 4.2%/ 0.4%] | 10780.3 [ 4.4%/ 0.4%] -1.1%| 10339.2 [ 9.6%/ 0.9%] -5.1% |
| 64/64 15086.1 [11.5%/ 1.2%] | 14262.0 [ 8.2%/ 0.8%] -5.5%| 11168.7 [22.2%/ 1.7%] -26.0% |
|128/128 15371.9 [22.0%/ 2.2%] | 14675.8 [14.4%/ 1.4%] -4.5%| 10963.9 [18.5%/ 1.4%] -28.7% |
|256/256 15990.8 [22.0%/ 2.2%] | 12227.9 [10.3%/ 1.0%] -23.5%| 10469.9 [19.6%/ 1.7%] -34.5% |
'-------------------------------------------------------------------------------------------------------------'

Thanks,
-Aubrey

Next message: Jason Wang: "[PATCH] tuntap: synchronize through tfiles instead of numqueues"
Previous message: Lu Baolu: "[PATCH v3 4/8] iommu/vt-d: Enable DMA remapping after rmrr mapped"
In reply to: Ingo Molnar: "Re: [RFC PATCH v2 00/17] Core scheduling v2"
Next in thread: Ingo Molnar: "Re: [RFC PATCH v2 00/17] Core scheduling v2"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]