Re: Regression in PMC code in 6.12-rc1
From: Marek Maślanka
Date: Sat Oct 12 2024 - 14:30:57 EST
Hi Hans,
On Thu, Oct 10, 2024 at 4:12 PM Hans de Goede <hdegoede@xxxxxxxxxx> wrote:
>
> Hi Marek,
>
> On 10-Oct-24 4:09 PM, Marek Maślanka wrote:
> > Hi Franz,
>
> Franz? I guess you are trying to address me (Hans) ?
Yes! Forgive me for this mistake!
>
>
> > I need to redesign this patch. The pmcdev->lock in the
> > pmc_core_acpi_pm_timer_suspend_resume might already be held by the
> > pmc_core_mphy_pg_show or pmc_core_pll_show if the userspace gets
> > frozen when these functions are being executed, this will cause a hang.
> >
> > Can you instruct me how to revert this patch? Or you can just do it?
>
> Please submit a revert based on top of:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git/log/?h=fixes
>
> with a commit message explaining why this needs to be reverted for now
> and then I will merge the revert into the fixes branch and include
> it in the next fixes pull-request to Torvalds.
Done.
>
> Regards,
>
> Hans
>
>
>
>
> > On Mon, Oct 7, 2024 at 12:57 PM Marek Maślanka <mmaslanka@xxxxxxxxxx <mailto:mmaslanka@xxxxxxxxxx>> wrote:
> >
> > Hi Luca,
> >
> > Thanks for the report.
> >
> > Seems that the tick_freeze function in the kernel/time/tick-common.c
> > is helding the spinlock so the pmc_core_acpi_pm_timer_suspend_resume
> > shouldn't try to take the mutex lock. I'll look for the solution.
> >
> > Marek
> >
> >
> > On Mon, Oct 7, 2024 at 11:17 AM Luca Coelho <luca@xxxxxxxxx <mailto:luca@xxxxxxxxx>> wrote:
> > >
> > > Hi Marek et al,
> > >
> > > We have been facing some errors when running some of our Display CI
> > > tests that seem to have been introduced by the following commit:
> > >
> > > e86c8186d03a ("platform/x86:intel/pmc: Enable the ACPI PM Timer to be turned off when suspended")
> > >
> > > The errors we are getting look like this:
> > >
> > > <4> [222.857770] =============================
> > > <4> [222.857771] [ BUG: Invalid wait context ]
> > > <4> [222.857772] 6.12.0-rc1-xe #1 Not tainted
> > > <4> [222.857773] -----------------------------
> > > <4> [222.857774] swapper/4/0 is trying to lock:
> > > <4> [222.857775] ffff8881174c88c8 (&pmcdev->lock){+.+.}-{3:3}, at: pmc_core_acpi_pm_timer_suspend_resume+0x50/0xe0 [intel_pmc_core]
> > > <4> [222.857782] other info that might help us debug this:
> > > <4> [222.857783] context-{4:4}
> > > <4> [222.857784] 1 lock held by swapper/4/0:
> > > <4> [222.857785] #0: ffffffff83452258 (tick_freeze_lock){....}-{2:2}, at: tick_freeze+0x16/0x110
> > > <4> [222.857791] stack backtrace:
> > > <4> [222.857793] CPU: 4 UID: 0 PID: 0 Comm: swapper/4 Not tainted 6.12.0-rc1-xe #1
> > > <4> [222.857794] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR5 RVP, BIOS RPLPFWI1.R00.4035.A00.2301200723 01/20/2023
> > > <4> [222.857796] Call Trace:
> > > <4> [222.857797] <TASK>
> > > <4> [222.857798] dump_stack_lvl+0x80/0xc0
> > > <4> [222.857802] dump_stack+0x10/0x20
> > > <4> [222.857805] __lock_acquire+0x943/0x2800
> > > <4> [222.857808] ? stack_trace_save+0x4b/0x70
> > > <4> [222.857812] lock_acquire+0xc5/0x2f0
> > > <4> [222.857814] ? pmc_core_acpi_pm_timer_suspend_resume+0x50/0xe0 [intel_pmc_core]
> > > <4> [222.857817] __mutex_lock+0xbe/0xc70
> > > <4> [222.857819] ? pmc_core_acpi_pm_timer_suspend_resume+0x50/0xe0 [intel_pmc_core]
> > > <4> [222.857822] ? pmc_core_acpi_pm_timer_suspend_resume+0x50/0xe0 [intel_pmc_core]
> > > <4> [222.857825] mutex_lock_nested+0x1b/0x30
> > > <4> [222.857827] ? mutex_lock_nested+0x1b/0x30
> > > <4> [222.857828] pmc_core_acpi_pm_timer_suspend_resume+0x50/0xe0 [intel_pmc_core]
> > > <4> [222.857831] acpi_pm_suspend+0x23/0x40
> > > <4> [222.857834] clocksource_suspend+0x2b/0x50
> > > <4> [222.857836] timekeeping_suspend+0x22a/0x360
> > > <4> [222.857839] tick_freeze+0x89/0x110
> > > <4> [222.857840] enter_s2idle_proper+0x34/0x1d0
> > > <4> [222.857843] cpuidle_enter_s2idle+0xaa/0x120
> > > <4> [222.857845] ? tsc_verify_tsc_adjust+0x42/0x100
> > > <4> [222.857849] do_idle+0x221/0x250
> > > <4> [222.857852] cpu_startup_entry+0x29/0x30
> > > <4> [222.857854] start_secondary+0x12e/0x160
> > > <4> [222.857856] common_startup_64+0x13e/0x141
> > > <4> [222.857859] </TASK>
> > >
> > > And the full logs can be found, for example, here:
> > >
> > > https://intel-gfx-ci.01.org/tree/intel-xe/xe-2016-92d12099cc768f36cf676ee1b014442a5c5ba965/shard-adlp-3/igt@kms_flip@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <https://intel-gfx-ci.01.org/tree/intel-xe/xe-2016-92d12099cc768f36cf676ee1b014442a5c5ba965/shard-adlp-3/igt@kms_flip@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
> > >
> > >
> > > Reverting this commit seems to prevent the problem. Do you have any
> > > idea what could be causing this and, more importantly, how to fix it?
> > > :)
> > >
> > > Thanks!
> > >
> > > --
> > > Cheers,
> > > Luca.
> >
>