Re: [PATCH v2 06/16] x86/mce: Remove __mcheck_cpu_init_early()
From: Borislav Petkov
Date: Thu Feb 27 2025 - 15:48:29 EST
On February 27, 2025 8:59:33 PM GMT+01:00, Yazen Ghannam <yazen.ghannam@xxxxxxx> wrote:
>On Thu, Feb 27, 2025 at 08:33:19PM +0100, Borislav Petkov wrote:
>> On February 27, 2025 5:31:48 PM GMT+01:00, Yazen Ghannam <yazen.ghannam@xxxxxxx> wrote:
>> >On Thu, Feb 27, 2025 at 04:25:00PM +0100, Borislav Petkov wrote:
>> >> On Thu, Feb 13, 2025 at 04:45:55PM +0000, Yazen Ghannam wrote:
>> >> > Also, move __mcheck_cpu_init_generic() after
>> >> > __mcheck_cpu_init_prepare_banks() so that MCA is enabled after the first
>> >> > MCA polling event.
>> >>
>> >> The reason being?
>> >>
>> >> Precaution?
>> >>
>> >> It was this way since forever, why are you moving it now? Any particular
>> >> reason?
>> >>
>> >
>> >1) To read/clear old errors before turning on MCA. The updated
>> >__mcheck_cpu_init_prepare_banks() function does this for the MCi_CTL
>> >registers. This patch does this for the MCG_CTL register too.
>> >
>> >2) To ensure that vendor-specific setup is finished beforehand also.
>>
>> That doesn't answer my question. All of the above gets done even without shuffling the order...
>>
>>
>
>MCA banks can start logging errors once MCG_CTL is set. The AMD docs say
>"The operating system must initialize the MCA_CONFIG registers prior to
>initialization of the MCA_CTL registers."
>
>"The MCA_CTL registers must be initialized prior to enabling the error
>reporting banks in MCG_CTL".
>
>However, the Intel docs "Machine-Check Initialization Pseudocode" say
>MCG_CTL first then MCi_CTL.
>
>But both agree that CR4.MCE should be set last.
>
>We have an old thread on the topic that led to this patch.
>https://lore.kernel.org/all/YqJHwXkg3Ny9fI3s@yaz-fattaah/
>
>And it seemed okay at the time.
>https://lore.kernel.org/all/YrnTMmwl5TrHwT9J@xxxxxxx/
>
>I don't think anything much has changed since then, so I included the
>old patch again in this set.
>
>Thanks,
>Yazen
This is exactly what needs to be in the commit message - why is the change being done.
--
Sent from a small device: formatting sucks and brevity is inevitable.