Re: [PATCH v3 10/15] cpufreq/amd-pstate: Move limit updating code
From: Dhananjay Ugwekar
Date: Wed Dec 18 2024 - 23:25:08 EST
On 12/18/2024 1:14 AM, Mario Limonciello wrote:
> On 12/17/2024 00:50, Dhananjay Ugwekar wrote:
>> On 12/16/2024 9:09 PM, Mario Limonciello wrote:
>>> On 12/16/2024 08:45, Dhananjay Ugwekar wrote:
>>>> On 12/16/2024 7:51 PM, Mario Limonciello wrote:
>>>>> On 12/16/2024 08:16, Dhananjay Ugwekar wrote:
>>>>>> Hello Mario,
>>>>>>
>>>>>> On 12/10/2024 12:22 AM, Mario Limonciello wrote:
>>>>>>> The limit updating code in amd_pstate_epp_update_limit() should not
>>>>>>> only apply to EPP updates. Move it to amd_pstate_update_min_max_limit()
>>>>>>> so other callers can benefit as well.
>>>>>>>
>>>>>>> With this move it's not necessary to have clamp_t calls anymore because
>>>>>>> the verify callback is called when setting limits.
>>>>>>
>>>>>> While testing this series, I observed that with amd_pstate=passive + schedutil governor,
>>>>>> the scaling_max_freq limits were not being honored and I bisected the issue down to this
>>>>>> patch.
>>>>>>
>>>>>> I went through the code and noticed that in amd_pstate_adjust_perf(), we set the min_perf
>>>>>> field in MSR_AMD_CPPC_REQ to "cap_perf" which is equal to cpudata->highest_perf (which is
>>>>>> equal to 255 for non-preferred cores systems). This didnt seem logical to me and I changed
>>>>>> cap_perf to cpudata->max_limit_perf which gives us the value updated in scaling_max_freq.
>>>>>>
>>>>>> I think as we removed the redundant clamping code, this pre-existing issue got exposed.
>>>>>> The below diff fixes the issue for me.
>>>>>>
>>>>>> Please let me know your thoughts on this.
>>>>>>
>>>>>> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
>>>>>> index d7b1de97727a..1ac34e3f1fc5 100644
>>>>>> --- a/drivers/cpufreq/amd-pstate.c
>>>>>> +++ b/drivers/cpufreq/amd-pstate.c
>>>>>> @@ -699,7 +699,7 @@ static void amd_pstate_adjust_perf(unsigned int cpu,
>>>>>> if (min_perf < lowest_nonlinear_perf)
>>>>>> min_perf = lowest_nonlinear_perf;
>>>> here^^^
>>>>>>
>>>>>> - max_perf = cap_perf;
>>>>>> + max_perf = cpudata->max_limit_perf;
>>>>>> if (max_perf < min_perf)
>>>>>> max_perf = min_perf;
>>>>>
>>>>> With this change I think you can also drop the comparison afterwards, as an optimization right?
>>>>
>>>> Umm I think it is possible that scaling_max_freq is set to a value lower than
>>>> lowest_nonlinear_freq in that case this if condition would be needed (as min_perf
>>>> is being lower bounded at lowest_nonlinear_freq at the location highlighted above).
>>>> I would be okay with keeping this check in.
>>>
>>> Well this feels like a bigger problem actually - why is it forcefully bounded at lowest nonlinear freq? Performance is going to be awful at that level
>>
>> Actually this wont necessarily deteriorate the performance, as we are just restricting
>> the min_perf to not go below lowest_nonlinear level. So we are actually ensuring that
>> the schedutil doesnt select a des_perf below lowest_nonlinear_perf.
>>
>> (hence why commit 5d9a354cf839a ("cpufreq/amd-pstate: Set the initial min_freq to lowest_nonlinear_freq") was done),
>
> Sorry re-reading I didn't get my thought out properly, I meant to say performance is going to be bad BELOW that level. We're in total agreement here.
>
>>>
>>> but shouldn't we "let" people go below that in passive and guided? We do for active.
>>
>> Yes I agree, we should allow the user to set min limit in the entire frequency range,
>> I thought there would've been some reason for restricting this. But I dont see any
>> reasoning for this in the blamed commit log as well. I think one reason would be that
>> below lowest_nonlinear_freq we dont get real power savings. And schedutil might dip
>> into this lower inefficient range if we dont force bound it.
>
> OK I guess to avoid regressions let's leave it as is and do a minimal change and we can revisit lifting this restriction later after you get testing done with it to see what actually happens.
Agreed, I think as we initialize min_perf with lowest_nonlinear_perf at boot time,
out-of-box there wont be any performance regressions. It is only if the user shoots himself
in the foot by lowering the min_perf further, they'll get bad performance and bad power savings.
We can do some performance testing, and then remove this if condition later on as you suggested.
>
>>
>> Thanks,
>> Dhananjay
>>
>>>
>>>>
>>>> Also, what is the behavior if max_perf is set to a value lower than min_perf in
>>>> the CPPC_REQ MSR? I guess platform FW would also be smart enough to handle this
>>>> implicitly, but cant say for sure.
>>>>
>>>
>>> I would hope so too; but yeah you're right we don't know for sure.
>>>
>>>>>
>>>>> As this is already in superm1.git/linux-next after testing can you please send a patch relative to superm1.git/linux-next branch?
>>>>
>>>> Sure, I'll send out the patch once we finalize on the above if condition.
>>>>
>>>> Regards,
>>>> Dhananjay
>>>>
>>>>>
>>>>> Thanks!
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Dhananjay