On 2024/9/20 19:44, Zhuo, Qiuxu wrote:
From: Tony W Wang-oc <TonyWWang-oc@xxxxxxxxxxx>
[...]
--- a/arch/x86/kernel/cpu/mce/zhaoxin.c
+++ b/arch/x86/kernel/cpu/mce/zhaoxin.c
@@ -63,3 +63,21 @@ void mce_zhaoxin_feature_clear(struct cpuinfo_x86
*c) {
intel_clear_lmce();
}
+
+void mce_zhaoxin_handle_storm(int bank, bool on) {
+ unsigned long flags;
+ u64 val;
+
+ raw_spin_lock_irqsave(&cmci_discover_lock, flags);
+ rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
+ if (on) {
+ val &= ~(MCI_CTL2_CMCI_EN |
MCI_CTL2_CMCI_THRESHOLD_MASK);
+ val |= CMCI_STORM_THRESHOLD;
+ } else {
+ val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK;
+ val |= (MCI_CTL2_CMCI_EN | cmci_threshold[bank]);
+ }
+ wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
+ raw_spin_unlock_irqrestore(&cmci_discover_lock, flags); }
Are there any reasons or comments why it needs to disable/enable the
CMCI interrupt here during a CMCI storm on/off? If not, then reuse
mce_intel_handle_storm() to avoid duplicating the code.
As explained in another email.
The reason is actually mentioned in the cover letter: "because Zhaoxin's UCR
error is not reported through CMCI", and we want to disable CMCI interrupt
when CMCI storm happened.
So, this is just you want to disable CMCI when a CMCI storm happens.
This doesn't explain much to me.
What's the problem if not disable CMCI when a CMCI storm happens?
In practice, we have encountered a lot of CE errors such as DRAM CE errors, so it feels safer to disable CMCI interrupt than to set a large threshold. At the same time, Zhaoxin's UCR is not reported through CMCI, so we implemented like this.
Sincerely
TonyWWang-oc