RE: [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on short time period

From: Luck, Tony
Date: Fri Oct 02 2020 - 12:04:21 EST

Next message: Mark Brown: "Re: [RESEND PATCH] spmi: prefix spmi bus device names with "spmi""
Previous message: Manivannan Sadhasivam: "Re: [PATCH] net: qrtr: ns: Fix the incorrect usage of rcu_read_lock()"
In reply to: James Morse: "Re: [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on short time period"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> Because from my x86 CPUs limited experience, the cache arrays are mostly
> fine and errors reported there are not something that happens very
> frequently so we don't even need to collect and count those.

On Intel X86 we leave the counting and threshold decisions about cache
health to the hardware. When a cache reaches the limit, it logs a "yellow"
status instead of "green" in the machine check bank (error is still marked
as "corrected"). The mcelog(8) daemon may attempt to take CPUs that share
that cache offline.

See Intel SDM volume 3B "15.4 Enhanced Cache Error Reporting"

-Tony

Next message: Mark Brown: "Re: [RESEND PATCH] spmi: prefix spmi bus device names with "spmi""
Previous message: Manivannan Sadhasivam: "Re: [PATCH] net: qrtr: ns: Fix the incorrect usage of rcu_read_lock()"
In reply to: James Morse: "Re: [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on short time period"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]