Re: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way

From: Borislav Petkov
Date: Mon Apr 08 2019 - 13:51:53 EST


On Mon, Apr 08, 2019 at 02:12:16PM +0000, Ghannam, Yazen wrote:
> From: Yazen Ghannam <yazen.ghannam@xxxxxxx>
>
> Current AMD systems have unique MCA banks per logical CPU even though
> the type of the banks may all align to the same bank number. Each CPU
> will have control of a set of MCA banks in the hardware and these are
> not shared with other CPUs.
>
> For example, bank 0 may be the Load-Store Unit on every logical CPU, but
> each bank 0 is a unique structure in the hardware. In other words, there
> isn't a *single* Load-Store Unit at MCA bank 0 that all logical CPUs
> share.
>
> This idea extends even to non-core MCA banks. For example, CPU0 and CPU4
> may see a Unified Memory Controller at bank 15, but each CPU is actually
> seeing a unique hardware structure that is not shared with other CPUs.
>
> Because the MCA banks are all unique hardware structures, it would be
> good to control them in a more granular way. For example, if there is a
> known issue with the Floating Point Unit on CPU5 and a user wishes to
> disable an error type on the Floating Point Unit, then it would be good
> to do this only for CPU5 rather than all CPUs.
>
> Also, future AMD systems may have heterogeneous MCA banks. Meaning the
> bank numbers may not necessarily represent the same types between CPUs.
> For example, bank 20 visible to CPU0 may be a Unified Memory Controller
> and bank 20 visible to CPU4 may be a Coherent Slave. So granular control
> will be even more necessary should the user wish to control specific MCA
> banks.
>
> Split the device attributes from struct mce_bank leaving only the MCA
> bank control fields.
>
> Make struct mce_banks[] per_cpu in order to have more granular control
> over individual MCA banks in the hardware.
>
> Allocate the device attributes statically based on the maximum number of
> MCA banks supported. The sysfs interface will use as many as needed per
> CPU. Currently, this is set to mca_cfg.banks, but will be changed to a
> per_cpu bank count in a future patch.
>
> Allocate the MCA control bits dynamically. Use the maximum number of MCA
> banks supported for now. This will be changed to a per_cpu bank count in
> a future patch.
>
> Redo the sysfs store/show functions to handle the per_cpu mce_banks[].
>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> ---
> arch/x86/kernel/cpu/mce/core.c | 77 ++++++++++++++++++++++------------
> 1 file changed, 51 insertions(+), 26 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 8d0d1e8425db..14583c5c6e12 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -64,16 +64,21 @@ static DEFINE_MUTEX(mce_sysfs_mutex);
>
> DEFINE_PER_CPU(unsigned, mce_exception_count);
>
> +struct mce_bank {
> + u64 ctl; /* subevents to enable */
> + bool init; /* initialise bank? */

Keep that vertical alignment as of that of the members if mce_bank_dev
below.

> +};
> +static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank*, mce_banks);

Space between mce_bank and *.

> +
> #define ATTR_LEN 16
> /* One object for each MCE bank, shared by all CPUs */
> -struct mce_bank {
> - u64 ctl; /* subevents to enable */
> - bool init; /* initialise bank? */
> +struct mce_bank_dev {
> struct device_attribute attr; /* device attribute */
> char attrname[ATTR_LEN]; /* attribute name */
> + u8 bank; /* bank number */
> };
> +static struct mce_bank_dev mce_bank_devs[MAX_NR_BANKS];

What bothers me here is the connection between the mce_bank and the
mce_bank_dev: it is simply not there.

Why isn't there a

struct mce_bank_dev *dev;

in struct mce_bank?

Because - and correct me if I'm wrong here - but I think if we do
per-CPU banks, then we need to selectively point from each mce_bank to
its corresponding mce_bank_dev descriptor so that you have the proper
names.

For example, if bank3 on CPU5 is not present/disabled/N/A/whatever, then
you need to not initialize the that sysfs file there and have:

/sys/devices/system/machinecheck/machinecheck5/
âââ bank0
âââ bank1
âââ bank10
âââ bank11
âââ bank12
âââ bank13
âââ bank14
âââ bank15
âââ bank16
âââ bank17
âââ bank18
âââ bank19
âââ bank2
âââ bank20
âââ bank21
âââ bank22
<--- bank 3 is not there because unsupported.
âââ bank4
âââ bank5
âââ bank6
âââ bank7
âââ bank8
âââ bank9


Which means that mce_device_create() should learn to be able to create
non-contiguous per-CPU bank sysfs files so that you'll have to iterate
over the per-CPU struct mce_banks array and use only those mce_bank_dev
* pointers which represent present banks on this CPU only.

Yes, no, am I way off?

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.