Re: [RFC PATCH 0/2] Disable RT-throttling for idle-inject threads

From: Atul Kumar Pant
Date: Wed Apr 10 2024 - 07:32:01 EST


On Wed, Apr 10, 2024 at 10:54:41AM +0200, Peter Zijlstra wrote:
> On Wed, Apr 10, 2024 at 10:24:15AM +0530, Atul Pant wrote:
> > We are trying to implement a solution for thermal mitigation by using
> > idle injection on CPUs. However we face some limitations with the
> > current idle-inject framework. As per our need, we want to start
> > injecting idle cycles on a cpu for indefinite time (until the
> > temperature/power of the CPU falls below a threshold). This will allow
> > to keep the hot CPUs in the sleep state until we see improvement in
> > temperature/power. If we set idle duration to a large value or have an
> > idle-injection ratio of 100%, then the idle-inject RT thread suffers
> > from RT throttling. This results in the CPU exiting from the sleep state
> > and consume some power.
> >
> > To solve this limitation, we propose a solution to disable RT-throttling
> > whenever idle-inject threads run. We achieve this by not accounting the
> > runtime for the idle-inject threads.
>
> Running RT tasks for indefinite amounts of time will wreck the system.
> Things like workqueues and other per-cpu threads expect service or
> things will pile up and run to ground.
>
> Idle injection, just like every other RT user must not be able to starve
> the system of service.
>
> If your system design requires this (I would argue it is broken), look
> at other means, like CPU-hotplug (which I also really detest) -- which
> takes down the CPU in a controlled manner and avoids the resource
> issues.

Hi Peter,
We are trying to add support for true 100% idle-injection ratio from
idle-injection framework. It might happen that we want to run idle cycles for
slightly more time than permitted by RT-bandwidth control. We understand the
concern about it hogging the cpu. Will it be better if we make it a choice for
the user who uses idle-inject framework, whether to have true 100%
idle-injection support or not?

Thanks
Atul