Re: [RFC][PATCH 1/3] cpuidle: Inject tick boundary state

From: Peter Zijlstra
Date: Wed Aug 02 2023 - 09:23:54 EST


On Wed, Aug 02, 2023 at 02:44:33PM +0200, Rafael J. Wysocki wrote:
> On Wed, Aug 2, 2023 at 12:34 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Mon, Jul 31, 2023 at 06:55:35PM +0200, Rafael J. Wysocki wrote:
> >
> > > > In that case you cannot tell the difference between I'm good to use this
> > > > state and I'm good to disable the tick and still use this state.
> > >
> > > No, you don't, but is it really worth the fuss?
> >
> > My somewhat aged IVB-EP sits around 25 us for restarting the tick.
> >
> > Depending on the C state, that is a significant chunk of exit latency,
> > and depending on how often you do the whole NOHZ dance, this can add up
> > to significant lost runtime too.
> >
> > And these are all machines that have a usable TSC, these numbers all go
> > up significantly when you somehow end up on the HPET or similar wreckage.
> >
> > Stopping the tick is slightly more expensive, but in the same order, I
> > get around 30 us on the IVB, vs 25 for restarting it. Reprogramming the
> > timer (LAPIC/TSC-DEADLINE) is the main chunk of it I suspect.
> >
> > So over-all that's 55 us extra latency for the full idle path, which can
> > definitely hurt.
> >
> > So yeah, I would say this is all worth it.
>
> I agree that, in general, it is good to avoid stopping the tick when
> it is not necessary to stop it.
>
> > My ADL is somewhat better, but also much higher clocked, and gets around
> > 10 us for a big core and 16 us for a little core for restarting the
> > tick.
>
> But my overall point is different.
>
> An additional bin would possibly help if the deepest state has been
> selected and its target residency is below the tick, and the closest
> timer (other than the tick) is beyond the tick. So how much of a
> difference would be made by making this particular case more accurate?

Many of the server parts have a deepest idle state around 600us, distros
have HZ=250. So every idle 600us < x < 4000us would unnecessarily
disable the tick.

How often this happens is of course workload dependent, but if unlucky
it could be a lot. It also adds the above mentioned latency to the idle
state, which for those parts is a significant chunk of the exit latency
extra.

The fix is 'trivial', why not do it?

Anyway, let me post my latest hackery :-)