Re: [RFC/RFT/[PATCH] cpuidle: New timer events oriented governor for tickless systems

From: Rafael J. Wysocki
Date: Tue Oct 16 2018 - 04:03:21 EST


On Tuesday, October 16, 2018 5:00:19 AM CEST Doug Smythies wrote:
> On 2018.10.15 00:52 Rafael J. Wysocki wrote:
> > On Sun, Oct 14, 2018 at 8:53 AM Doug Smythies <dsmythies@xxxxxxxxx> wrote:
> >> On 2018.10.11 14:02 Rafael J. Wysocki wrote:
> >
> > ...[cut]...
> >
> >>> Overall, it selects deeper idle states than menu more often, but
> >>> that doesn't seem to make a significant difference in the majority
> >>> of cases.
> >>
> >> Not always, that viscous powernightmare sweep test that I run used
> >> way way more processor package power and spent a staggering amount
> >> of time in idle state 0. [1].
> >
> > Can you please remind me what exactly the workload is in that test?
>
> The problem with my main test computer is that I have never had a good
> way to make it use idle state 0 and/or idle state 1 a significant amount,
> while not setting the need-resched flag. Due to the minimum overheads
> involved, in a tight loop c program calling nanosleep with an only 1
> nanosecond argument, will result in about 50 (44 to 57 measured)
> microseconds, or much too long to invoke idle state 0 or 1 (at least
> on my test computer). So, for my 8 CPU older model i7-2600K, the idea
> is to spin out 40 threads doing short sleeps in an attempt to pile up
> events such that the shallower idle states are invoked more often.
>
> Why 40 threads, one might wonder? This was many months ago now, but
> I tested quite a number of threads, and 40 seemed to provide the
> most interesting results for this type of work. I have not rechecked
> it since (probably should).
>
> For the testing I did in August for this:
>
> "[PATCH] cpuidle: menu: Retain tick when shallow state is selected"
> [2].
> The thinking was to sweep through a wide range of sleep times,
> and see if anything odd shows up. The test description is copied
> here:
>
> In [2] Doug wrote:
> > Test 1: A Thomas Ilsche type "powernightmare" test:
> > (forever ((10 times - variable usec sleep) 0.999 seconds sleep) X 40 staggered
> > threads. Where the "variable" was from 0.05 to 5 in steps of 0.05, for the first ~200
> > minutes of the test. (note: overheads mean that actual loop times are quite
> > different.) And then from 5 to 500 in steps of 1, for the remaining 1000 minutes of
> > the test. Each step ran for 2 minutes. The system was idle for 1 minute at the start,
> > and a few minutes at the end of the graphs.
> > While called "kernel 4.18", the baseline was actually from mainline at head =
> > df2def4, or just after Rafael's linux-pm "pm-4.19-rc1-2" merge.
> > (actually after the next acpi merge).
> > Reference kernel = df2def4 with the two patches reverted.
>
> However, that description was flawed, because there actually was never
> a long sleep (incompetence on my part, but it doesn't really matter).
> That test was 1200 minutes, and is worth looking at [3].
> Notice how, as the test progresses, a migration through the idle
> states can be observed, just as expected.
>
> The next old reference of this test was the 8 patch set on top of
> Kernel 4.19-rc6 [4], from a week ago. However, I shortened the test
> by 900 minutes. Why? Well, there is only so much time in a day.
>
> So now, back to the test this thread is about [1]. It might be
> argued that maybe the TEO governor should be spending more time
> in idle state 0 near the start of test, as the test shows. Trace
> data does, maybe, support such an argument, but I haven't had
> time to dig into it.
>
> I also wonder if some of the weirdness later in the test is
> repeatable or not (re: discussion elsewhere on this thread,
> now cut, about lack of repeatability). However, I have not
> had time to repeat the test.
>
> Hope this helps, and sorry for any confusion and this long e-mail.

Yes, it helps, many thanks again and no worries about long emails. :-)

I'm going to make some changes to the new governor to take your observations
into account.

Cheers,
Rafael