On Wed, Mar 17, 2021 at 11:44 PM Pratik Sampat <psampat@xxxxxxxxxxxxx> wrote:
Hi Doug,...
Thanks for trying these patches out.
On 18/03/21 2:30 am, Doug Smythies wrote:
Hi Pratik,
It just so happens that I have been trying Artem's version this last
week, so I tried yours.
On Mon, Mar 15, 2021 at 4:49 AM Pratik Rajesh Sampat
<psampat@xxxxxxxxxxxxx> wrote:
...
I didn't try it, I just did a hack so I could continue for this reply.Other notes:I had tried these patches on an IBM POWER 9 processor and disabling CPU0's idle
No idle state for CPU 0 ever gets disabled.
I assume this is because CPU 0 can never be offline,
so that bit of code (Disable all stop states) doesn't find its state.
By the way, processor = Intel i5-9600K
state works there. However, it does make sense for some processors to treat CPU
0 differently.
Maybe I could write in a case if idle state disabling fails for a CPU then we
just skip it?
The IPI test has less variability than the Timer test.The system is left with all idle states disabled, well not for CPU 0Question: Do you notice high variability with IPI test, Timer test or both?
as per the above comment. The suggestion is to restore them,
otherwise my processor hogs 42 watts instead of 2.
My results are highly variable per test.
I can think of two reasons for high run to run variance:Yes, I have reviewed all the references.
1. If you observe variance in timer tests, then I believe there could a
mechanism of "C-state pre-wake" on some Intel machines at play here, which can
pre-wake a CPU from an idle state when timers are armed. I'm not sure if the
Intel platform that you're running on does that or not.
Artem had described this behavior to me a while ago and I think his wult page
describes this behavior in more detail:
https://intel.github.io/wult/#c-state-pre-wake
And yes, I think my processors have the pre-wake stuff.
I do not have the proper hardware to do the Artem pre-wake workaround
method, but might buy it in future.
2. I have noticed variability in results when there are kernel book-keeping orYes.
jitter tasks scheduled from time to time on an otherwise idle core.
In the full per-CPU logs at tools/testing/selftests/cpuidle/cpuidle.log can you
spot any obvious outliers per-CPU state?
I'll just paste in an example cpuidle.log file having used the -v option
below, along with my hack job diff.
doug@s19:~/temp-k-git/linux/tools/testing/selftests/cpuidle$ cat
cpuidle.log.v3-1
--IPI Latency Test---
--Baseline IPI Latency measurement: CPU Busy--
SRC_CPU DEST_CPU IPI_Latency(ns)
0 0 140
0 1 632
0 2 675
0 3 671
0 4 675
0 5 767
0 6 653
0 7 826
0 8 819
0 9 615
0 10 758
0 11 758
Baseline Avg IPI latency(ns): 665
---Enabling state: 0---
SRC_CPU DEST_CPU IPI_Latency(ns)
0 0 76
0 1 484
0 2 494
0 3 539
0 4 498
0 5 491
0 6 474
0 7 434
0 8 544
0 9 476
0 10 447
0 11 467
Expected IPI latency(ns): 0
Observed Avg IPI latency(ns) - State 0: 452
---Enabling state: 1---
SRC_CPU DEST_CPU IPI_Latency(ns)
0 0 72
0 1 1081
0 2 821
0 3 1486
0 4 1022
0 5 960
0 6 1634
0 7 933
0 8 1032
0 9 1046
0 10 1430
0 11 1338
Expected IPI latency(ns): 1000
Observed Avg IPI latency(ns) - State 1: 1071
---Enabling state: 2---
SRC_CPU DEST_CPU IPI_Latency(ns)
0 0 264
0 1 30836
0 2 30562
0 3 30748
0 4 35286
0 5 30978
0 6 1952
0 7 36066
0 8 30670
0 9 30605
0 10 30635
0 11 35423
Expected IPI latency(ns): 120000
Observed Avg IPI latency(ns) - State 2: 27002
---Enabling state: 3---
SRC_CPU DEST_CPU IPI_Latency(ns)
0 0 71
0 1 30853
0 2 32095
0 3 32661
0 4 30230
0 5 34348
0 6 2012
0 7 30816
0 8 30908
0 9 31130
0 10 34150
0 11 32050
Expected IPI latency(ns): 1034000
Observed Avg IPI latency(ns) - State 3: 26777
--Timeout Latency Test--
--Baseline Timeout Latency measurement: CPU Busy--
Wakeup_src Baseline_delay(ns)
0 453
1 568
2 387
3 337
4 433
5 579
6 330
7 400
8 561
9 544
10 569
11 523
Baseline Avg timeout diff(ns): 473
---Enabling state: 0---
Wakeup_src Baseline_delay(ns) Delay(ns)
0 399
1 388
2 352
3 385
4 334
5 415
6 320
7 356
8 401
9 379
10 339
11 384
Expected timeout(ns): 200
Observed Avg timeout diff(ns) - State 0: 371
---Enabling state: 1---
Wakeup_src Baseline_delay(ns) Delay(ns)
0 666
1 575
2 608
3 590
4 608
5 552
6 582
7 593
8 597
9 587
10 588
11 610
Expected timeout(ns): 1200
Observed Avg timeout diff(ns) - State 1: 596
---Enabling state: 2---
Wakeup_src Baseline_delay(ns) Delay(ns)
0 36386
1 1069
2 866
3 884
4 850
5 55642
6 408082
7 1184
8 406075
9 406830
10 414105
11 406594
Expected timeout(ns): 360200
Observed Avg timeout diff(ns) - State 2: 178213
---Enabling state: 3---
Wakeup_src Baseline_delay(ns) Delay(ns)
0 406049
1 913
2 410134
3 921
4 406237
5 950
6 407181
7 920
8 407678
9 894
10 406320
11 304161
Expected timeout(ns): 3102200
Observed Avg timeout diff(ns) - State 3: 229363
My hack job, (CPUs always online):
diff --git a/tools/testing/selftests/cpuidle/cpuidle.sh
b/tools/testing/selftests/cpuidle/cpuidle.sh
index de5141d5b76b..70bdacda5e91 100755
--- a/tools/testing/selftests/cpuidle/cpuidle.sh
+++ b/tools/testing/selftests/cpuidle/cpuidle.sh
@@ -86,10 +86,6 @@ disable_idle()
{
for ((cpu=0; cpu<NUM_CPUS; cpu++))
do
- local cpu_status=$(cpu_is_online $cpu)
- if [ $cpu_status == 0 ]; then
- continue
- fi
for ((state=0; state<NUM_STATES; state++))
do
echo 1 >
/sys/devices/system/cpu/cpu$cpu/cpuidle/state$state/disable
@@ -104,10 +100,6 @@ op_state()
{
for ((cpu=0; cpu<NUM_CPUS; cpu++))
do
- local cpu_status=$(cpu_is_online $cpu)
- if [ $cpu_status == 0 ]; then
- continue
- fi
echo $1 >
/sys/devices/system/cpu/cpu$cpu/cpuidle/state$2/disable
done
}
@@ -124,17 +116,6 @@ cpuidle_disable_state()
op_state 1 $state
}
-cpu_is_online()
-{
- cpu=$1
- if [ ! -f "/sys/devices/system/cpu/cpu$cpu/online" ]; then
- echo 0
- return
- fi
- status=$(cat /sys/devices/system/cpu/cpu$cpu/online)
- echo $status
-}
-
# Extract latency in microseconds and convert to nanoseconds
extract_latency()
{
@@ -179,10 +160,6 @@ run_ipi_tests()
printf "%s %10s %12s\n" "SRC_CPU" "DEST_CPU" "IPI_Latency(ns)" >> $LOG
for ((cpu=0; cpu<NUM_CPUS; cpu+=SMT))
do
- local cpu_status=$(cpu_is_online $cpu)
- if [ $cpu_status == 0 ]; then
- continue
- fi
ipi_test_once "baseline" $cpu
printf "%-3s %10s %12s\n" $src_cpu $cpu $ipi_latency >> $LOG
avg_arr+=($ipi_latency)
@@ -198,10 +175,6 @@ run_ipi_tests()
printf "%s %10s %12s\n" "SRC_CPU" "DEST_CPU"
"IPI_Latency(ns)" >> $LOG
for ((cpu=0; cpu<NUM_CPUS; cpu+=SMT))
do
- local cpu_status=$(cpu_is_online $cpu)
- if [ $cpu_status == 0 ]; then
- continue
- fi
# Running IPI test and logging results
sleep 1
ipi_test_once "test" $cpu
@@ -262,10 +235,6 @@ run_timeout_tests()
printf "%s %10s %10s\n" "Wakeup_src" "Baseline_delay(ns)">> $LOG
for ((cpu=0; cpu<NUM_CPUS; cpu+=SMT))
do
- local cpu_status=$(cpu_is_online $cpu)
- if [ $cpu_status == 0 ]; then
- continue
- fi
timeout_test_once "baseline" $cpu 1000000
printf "%-3s %13s\n" $src_cpu $timeout_diff >> $LOG
avg_arr+=($timeout_diff)
@@ -281,10 +250,6 @@ run_timeout_tests()
printf "%s %10s %10s\n" "Wakeup_src"
"Baseline_delay(ns)" "Delay(ns)" >> $LOG
for ((cpu=0; cpu<NUM_CPUS; cpu+=SMT))
do
- local cpu_status=$(cpu_is_online $cpu)
- if [ $cpu_status == 0 ]; then
- continue
- fi
timeout_test_once "test" $cpu 1000000
printf "%-3s %13s %18s\n" $src_cpu
$baseline_timeout_diff $timeout_diff >> $LOG
avg_arr+=($timeout_diff)
@@ -314,3 +279,7 @@ run_timeout_tests
printf "Removing $MODULE module\n"
printf "Full Output logged at: $LOG\n"
rmmod $MODULE
+
+printf "enabling idle states\n"
+
+echo 0 | tee /sys/devices/system/cpu/cpu*/cpuidle/state*/disable
(END)