Re: [PATCH DEBUG] x86, pat/mtrr: MTRR/PAT init earlier for each APs

From: Chen Yu
Date: Sun Jan 15 2017 - 21:57:04 EST


Hi experts,
do you have any clue on this? thanks.

On Tue, Dec 20, 2016 at 06:21:28PM +0800, Chen Yu wrote:
> This is a debug patch to descibe/workaround the issue
> we encountered recently.
>
> Problem and the cause:
> Currently we are suffering from *extremely* slow CPU online
> speed during system resuming from S3. Say, the MacBookPro 2015
> has 4 CPUs, and it took more than 1 second each for both CPU1
> and CPU3 to be brought back to idle thread again. Further ftrace
> result showed that, *each* instruction the CPU1 and CPU3 execute
> will take much longer time than it will take during normal cpu
> online via sysfs(without S3 involved). And more interesting
> thing was found that after resumed back, every instruction CPU1 and
> CPU3 execute is back to its normal speed(unixbench has the same score
> before/after S3). So it smells like there is something wrong with
> the cache/tlb settings only during resuming back from S3.
> Finally we have found this might be related to BIOS who has
> scribbled the mtrr/pat before it resumed back to the OS, and every
> instruction seems to be run in an uncached behavior, fortunately
> later after all the APs have been brought up again, mtrr_aps_init()
> will be invoked to synchronize the mtrr on these APs to the value
> once saved by CPU0 before suspended, thus everything is back
> to normal after resumed.
>
> Workaround:
> So it turns out to be that if we can synchronize the APs with boot CPU
> ASAP, rather than waiting till all CPUS online, it might reduce the
> impact of the bogus BIOS who scribbled the mtrr/pat. So here is the
> hack patch to let the users to synchronize mtrr on APs earlier.
> With the following debug patch applied, the resume time for CPU1 and
> CPU3 have dropped a lot.
>
> (Notice, the following result were tested with ftrace function_graph enabled
> during suspend/resume, by this tool:
> https://01.org/suspendresume
>
>
> Before patch applied:
> [ 619.810899] Enabling non-boot CPUs ...
> [ 619.825528] x86: Booting SMP configuration:
> [ 619.825537] smpboot: Booting Node 0 Processor 1 APIC 0x2
> -------skip--------
> [ 621.723809] CPU1 is up
> [ 621.762843] smpboot: Booting Node 0 Processor 2 APIC 0x1
> -------skip--------
> [ 621.766679] CPU2 is up
> [ 621.840228] smpboot: Booting Node 0 Processor 3 APIC 0x3
> -------skip--------
> [ 626.690900] CPU3 is up
>
> So it took CPU1 621.723809 - 619.825537 = 1898.272 ms, and
> CPU3 626.690900 - 621.840228 = 4850.672 ms !
>
>
> After patch applied:
> [ 106.931790] smpboot: Booting Node 0 Processor 1 APIC 0x2
> -------skip--------
> [ 106.948360] CPU1 is up
> [ 106.963975] smpboot: Booting Node 0 Processor 2 APIC 0x1
> -------skip--------
> [ 106.968087] CPU2 is up
> [ 106.986534] smpboot: Booting Node 0 Processor 3 APIC 0x3
> -------skip--------
> [ 106.990702] CPU3 is up
>
> It took CPU1 106.948360 - 106.931790 = 16.57 ms, and
> CPU3 106.990702 - 106.986534 = 4.16 ms
>
> Question:
> So it turns out to be a BIOS issue, but Linux should also deal with
> this bogus BIOS, right? I studied the commit we delay the synchronization
> until all the APs are brought up, and according to:
>
> Commit d0af9eed5aa9 ("x86, pat/mtrr: Rendezvous all the cpus
> for MTRR/PAT init")
>
> It seems that there would be problem if we do not sync APs at the same
> time(some CPUs run with cache disabled will hang the system, because its
> sibling is trying to adjust the mtrr which might disable its cache) on
> some special platforms? But I have a question that, even in our current
> solution which defers the synchronization, the scenario mentioned above
> can not be avoided because at the time CPU3 is trying to restore mtrr,
> its sibling CPU1 might also be doing some kworker or ticking tasks,
> the CPU1 might also run with cache disabled?
> I'm not sure if I understand the code correctly, and it would be
> appreciated if people could give any comments/suggestions on how to deal
> with this situation found on MacProBook, and if you need me to do any test
> please feel free to let me know.
>
> Thanks,
> Yu
>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
> Cc: Len Brown <len.brown@xxxxxxxxx>
> Cc: "Rafael J. Wysocki" <rjw@xxxxxxxxxxxxx>
> Cc: Suresh Siddha <sbsiddha@xxxxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxx>
> Cc: Lukas Wunner <lukas@xxxxxxxxx>
> Cc: "Brandt, Todd E" <todd.e.brandt@xxxxxxxxx>
> Cc: Rui Zhang <rui.zhang@xxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Signed-off-by: Chen Yu <yu.c.chen@xxxxxxxxx>
> ---
> arch/x86/kernel/cpu/mtrr/main.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
> index 24e87e7..eddaa89 100644
> --- a/arch/x86/kernel/cpu/mtrr/main.c
> +++ b/arch/x86/kernel/cpu/mtrr/main.c
> @@ -813,15 +813,28 @@ void mtrr_save_state(void)
> put_online_cpus();
> }
>
> +static bool __read_mostly no_aps_delay;
> +
> +static int __init no_aps_setup(char *str)
> +{
> + no_aps_delay = true;
> + pr_info("hack: do not delay aps mtrr/pat initialization.\n");
> +
> + return 0;
> +}
> +
> void set_mtrr_aps_delayed_init(void)
> {
> if (!mtrr_enabled())
> return;
> if (!use_intel())
> return;
> + if (no_aps_delay)
> + return;
>
> mtrr_aps_delayed_init = true;
> }
> +early_param("no_aps_delay", no_aps_setup);
>
> /*
> * Delayed MTRR initialization for all AP's
> --
> 2.7.4
>