Re: [PATCHSET v6 0/4] Split iowait into two states

From: Christian Loehle
Date: Thu Sep 05 2024 - 07:10:02 EST


On 9/5/24 12:00, Peter Zijlstra wrote:
> On Thu, Sep 05, 2024 at 11:31:09AM +0100, Christian Loehle wrote:
>> On 9/5/24 10:36, Peter Zijlstra wrote:
>>> On Wed, Sep 04, 2024 at 05:18:57PM +0200, Rafael J. Wysocki wrote:
>>>
>>>> To be more precise, there are two different uses of "iowait" in PM.
>>>>
>>>> One is the nr_iowait_cpu() call in menu_select() and the result of it
>>>> is used for two purposes: (1) select different sets of statistics
>>>> depending on whether or not this number is zero and (2) set a limit
>>>> for the idle state's exit latency that depends on this number (but
>>>> note that it only takes effect when the "iowait" statistics are used
>>>> in the first place). Both of these are arguably questionable and it
>>>> is unclear to me whether or not they actually help and how much.
>>>
>>> So this one is very dubious, it relies on tasks getting back on the CPU
>>> they went to sleep on -- not guaranteed at all.
>>>
>>>> The other use is boosting CPU frequency in schedutil and intel_pstate
>>>> if SCHED_CPUFREQ_IOWAIT is passed to them which in turn depends on the
>>>> p->in_iowait value in enqueue_task_fair().
>>>
>>> This one is fine and makes sense. At this point we know that p is going
>>> to run and where it is going to run.
>>
>> On any even remotely realistic scenario and hardware though the boost
>> isn't effective until the next enqueue-dequeue-cycle, so if your above
>> objection is based on that, I would object here too, using your argument.
>
> That is a quality of implementation issue with schedutil no?

Is it? So there is a latency from requesting a new frequency and actually
running on it, for both x86 and arm platforms out there that should still
be a few usecs at least during which the task is running. The task will
dequeue quite soon (otherwise it will build up utilization and then it's
not one we consider problematic wrt to this io utilization problem anyway).
Just to be clear, I'm assuming fast_switch here and then I think schedutil's
implementation isn't the problem, rather the premise of the underlying
problem is.
I have tried to elaborate on that in the RFC I've posted and linked though.