Re: [PATCH 0/1] Forced-wakeup for stop lite states on Powernv

From: Gautham R Shenoy
Date: Thu May 16 2019 - 04:24:43 EST


Hi Nicholas,

On Thu, May 16, 2019 at 04:13:17PM +1000, Nicholas Piggin wrote:

>
> > The motivation behind this patch was a HPC customer issue where they
> > were observing some CPUs in the core getting stuck at stop0_lite
> > state, thereby lowering the performance on the other CPUs of the core
> > which were running the application.
> >
> > Disabling stop0_lite via sysfs didn't help since we would fallback to
> > snooze and it would make matters worse.
>
> snooze has the timeout though, so it should kick into stop0 properly
> (and if it doesn't that's another issue that should be fixed in this
> series).
>
> I'm not questioning the patch for stop0_lite, to be clear. I think
> the logic is sound. I just raise one urelated issue that happens to
> be for stop0_lite as well (should we even enable it on P9?), and one
> peripheral issue (should we make a similar fix for deeper stop states?)
>

I think it makes sense to generalize this from the point of view of
CPUs remaining in shallower idle states for long durations on tickless
kernels.

> >
> >>
> >> We should always have fewer states unless proven otherwise.
> >
> > I agree.
> >
> >>
> >> That said, we enable it today so I don't want to argue this point
> >> here, because it is a different issue from your patch.
> >>
> >> > When it is in stop0 or deeper,
> >> > it free up both
> >> > space and time slice of core.
> >> > In stop0_lite, cpu doesn't free up the core resources and thus inhibits
> >> > thread
> >> > folding. When a cpu goes to stop0, it will free up the core resources
> >> > thus increasing
> >> > the single thread performance of other sibling thread.
> >> > Hence, we do not want to get stuck in stop0_lite for long duration, and
> >> > want to quickly
> >> > move onto the next state.
> >> > If we get stuck in any other state we would possibly be losing on to
> >> > power saving,
> >> > but will still be able to gain the performance benefits for other
> >> > sibling threads.
> >>
> >> That's true, but stop0 -> deeper stop is also a benefit (for
> >> performance if we have some power/thermal constraints, and/or for power
> >> usage).
> >>
> >> Sure it may not be so noticable as the SMT switch, but I just wonder
> >> if the infrastructure should be there for the same reason.
> >>
> >> I was testing interrupt frequency on some tickless workloads configs,
> >> and without too much trouble you can get CPUs to sleep with no
> >> interrupts for many minutes. Hours even. We wouldn't want the CPU to
> >> stay in stop0 for that long.
> >
> > If it stays in stop0 or even stop2 for that long, we would want to
> > "promote" it to a deeper state, such as say STOP5 which allows the
> > other cores to run at higher frequencies.
>
> So we would want this same logic for all but the deepest runtime
> stop state?

Yes. We can, in steps, promote individual threads of the core to
eventually request a deeper state such as stop4/5. On a completely
idle tickless system, eventually we should see the core go to the
deeper idle state.

>
> >> Just thinking about the patch itself, I wonder do you need a full
> >> kernel timer, or could we just set the decrementer? Is there much
> >> performance cost here?
> >>
> >
> > Good point. A decrementer would do actually.
>
> That would be good if it does, might save a few cycles.
>
> Thanks,
> Nick
>

--
Thanks and Regards
gautham.