Re: [PATCH] platform/x86/amd: pmc: Add a workaround for an s0i3 issue on Cezanne

From: Hans de Goede
Date: Wed Dec 07 2022 - 12:20:21 EST


Hi Mario,

On 12/6/22 00:02, Limonciello, Mario wrote:
> [Public]
>
>
>
>> -----Original Message-----
>> From: Hans de Goede <hdegoede@xxxxxxxxxx>
>> Sent: Thursday, November 17, 2022 10:09
>> To: Limonciello, Mario <Mario.Limonciello@xxxxxxx>; S-k, Shyam-sundar
>> <Shyam-sundar.S-k@xxxxxxx>
>> Cc: Mahapatra, Rajib <Rajib.Mahapatra@xxxxxxx>; Raul Rangel
>> <rrangel@xxxxxxxxxxxx>; Mark Gross <markgross@xxxxxxxxxx>; platform-
>> driver-x86@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
>> Subject: Re: [PATCH] platform/x86/amd: pmc: Add a workaround for an s0i3
>> issue on Cezanne
>>
>> Hi,
>>
>> On 11/17/22 17:06, Limonciello, Mario wrote:
>>> [Public]
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Hans de Goede <hdegoede@xxxxxxxxxx>
>>>> Sent: Thursday, November 17, 2022 08:06
>>>> To: Limonciello, Mario <Mario.Limonciello@xxxxxxx>; S-k, Shyam-sundar
>>>> <Shyam-sundar.S-k@xxxxxxx>
>>>> Cc: Mahapatra, Rajib <Rajib.Mahapatra@xxxxxxx>; Raul Rangel
>>>> <rrangel@xxxxxxxxxxxx>; Mark Gross <markgross@xxxxxxxxxx>;
>> platform-
>>>> driver-x86@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
>>>> Subject: Re: [PATCH] platform/x86/amd: pmc: Add a workaround for an
>> s0i3
>>>> issue on Cezanne
>>>>
>>>> Hi Mario,
>>>>
>>>> On 11/16/22 16:43, Mario Limonciello wrote:
>>>>> Cezanne platforms under the right circumstances have a synchronization
>>>>> problem where attempting to enter s2idle may fail if the x86 cores are
>>>>> put into HLT before hardware resume from the previous attempt has
>>>>> completed.
>>>>>
>>>>> To avoid this issue add a 10-20ms delay before entering s2idle another
>>>>> time. This workaround will only be applied on interrupts that wake the
>>>>> hardware but don't break the s2idle loop.
>>>>>
>>>>> Cc: "Mahapatra, Rajib" <Rajib.Mahapatra@xxxxxxx>
>>>>> Cc: "Raul Rangel" <rrangel@xxxxxxxxxxxx>
>>>>> Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
>>>>
>>>> Thank you for your patch, I've applied this patch to my review-hans
>>>> branch:
>>>>
>> https://git.k
%2F&amp;data=05%7C01%7Cmario.limonciello%40amd.com%7Cb3c04b4449
>> 154cad4f8208dac8b61509%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C
>> 0%7C638042981632459900%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj
>> AwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%
>> 7C%7C&amp;sdata=3ZzdcI0BsHknBInf8V4MfrmNCkkc2U9ygYf4IP25LJ4%3D&
>> amp;reserved=0
>> ernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Fpdx86%2Fplatform-
>>>> drivers-x86.git%2Flog%2F%3Fh%3Dreview-
>>>>
>> hans&amp;data=05%7C01%7Cmario.limonciello%40amd.com%7C674f8bf7a8
>>>>
>> 114f83a3b408dac8a4d941%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C
>>>>
>> 0%7C638042907591739047%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj
>>>>
>> AwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%
>>>>
>> 7C%7C&amp;sdata=XYwl%2FOvUFy%2Bgz9EY9oa35M%2BkLf%2Bud8PKXynQ
>>>> FlrUdoE%3D&amp;reserved=0
>>>>
>>>> Please let me know if it important to get this as a fix into 6.1,
>>>> I wasn't really planning on doing any more fixes pull-reqs for 6.1,
>>>> but I can do one if necessary.
>>>>
>>>
>>> AFAIK it's a corner case. I think it can wait until 6.2, but I think it should
>> probably
>>> be Cc to 6.1 stable (which has the ability to run code in the check()) phase.
>>
>> Ok, I have added a:
>>
>> Cc: stable@xxxxxxxxxxxxxxx # 6.1
>>
>> to the commit msg.
>>
>> Regards,
>>
>> Hans
>
> Hi Hans,
>
> I just wanted to update you on this workaround. Previously it was believed to
> only be a very specific set of circumstances that happened on chromebooks running
> coreboot and an EC running cros_ec being utilized with unfortunate timing.
>
> However it turns out that it can be "relatively" easily reproduced on UEFI machines
> as well though by suspending the laptop and then issuing anything that causes an
> ACPI event that otherwise shouldn't break the s2idle loop (such as closing the lid or
> unplugging the power adapter).
>
> What will happen is that the SOC enters the deepest state up until the time of that
> ACPI event and then never enters again. The most common case this will break I think
> is someone suspends the laptop in GNOME, closes the lid and then tosses it in their bag.
> If you examine /sys/kernel/debug/amd_pmc/* you'll see that the duration of time in
> deepest state matches the time between suspending in GNOME and closing the lid.
>
> I wanted to provide you that context to decide if this should still try to catch this in
> a 6.1 final pull request or not. Had I known how widely this helped at that time I
> would have advocated accordingly.

Based on the above I have prepared a pull-req to Linus with just this single
patch in it to get this added to 6.1 .

I'll Cc you on the pull-req submission to Linus.

Regards,

Hans





>>>>> ---
>>>>> drivers/platform/x86/amd/pmc.c | 6 ++++++
>>>>> 1 file changed, 6 insertions(+)
>>>>>
>>>>> diff --git a/drivers/platform/x86/amd/pmc.c
>>>> b/drivers/platform/x86/amd/pmc.c
>>>>> index ef4ae977b8e0..439d282aafd1 100644
>>>>> --- a/drivers/platform/x86/amd/pmc.c
>>>>> +++ b/drivers/platform/x86/amd/pmc.c
>>>>> @@ -739,8 +739,14 @@ static void amd_pmc_s2idle_prepare(void)
>>>>> static void amd_pmc_s2idle_check(void)
>>>>> {
>>>>> struct amd_pmc_dev *pdev = &pmc;
>>>>> + struct smu_metrics table;
>>>>> int rc;
>>>>>
>>>>> + /* CZN: Ensure that future s0i3 entry attempts at least 10ms passed
>>>> */
>>>>> + if (pdev->cpu_id == AMD_CPU_ID_CZN &&
>>>> !get_metrics_table(pdev, &table) &&
>>>>> + table.s0i3_last_entry_status)
>>>>> + usleep_range(10000, 20000);
>>>>> +
>>>>> /* Dump the IdleMask before we add to the STB */
>>>>> amd_pmc_idlemask_read(pdev, pdev->dev, NULL);
>>>>>
>>>
>