Re: [Patch V1] x86, mce: CPU synchronization for broadcast MCE's is surprised by offline CPUs

From: Borislav Petkov
Date: Fri Sep 11 2015 - 04:46:47 EST


On Thu, Sep 10, 2015 at 08:26:38PM -0400, Ashok Raj wrote:
> +#define OFFLINE_CPU_LOG_LEN 16
> +
> +struct offline_cpu_mce {
> + unsigned short head;
> + unsigned short tail;
> + struct mce mce_log[OFFLINE_CPU_LOG_LEN];
> +};
> +
> +static struct offline_cpu_mce offline_mce;
> +static unsigned int offline_mce_overflow = 0;
> +
> +/*
> + * Add mce's discovered in offline cpu which will be logged by the
> + * MCE rendezvous master. There is no lock required, since MCE's are
> + * processed one cpu at a time, sequenced by the rendezvous master CPU
> + * Safe to be called only from MCE handler.
> + */
> +static int offline_mce_add(struct mce *m)
> +{
> + unsigned next;
> +
> + next = (offline_mce.tail + 1) % OFFLINE_CPU_LOG_LEN;
> + if (next == offline_mce.head) {
> + offline_mce_overflow++;
> + return -1;
> + }
> +
> + offline_mce.mce_log[offline_mce.tail] = *m;
> + offline_mce.tail = next;
> + return 0;
> +}
> +
> +static int offline_mce_get(struct mce *m)
> +{
> + int ret = 0;
> +
> + if (offline_mce.head == offline_mce.tail)
> + goto out;
> +
> + *m = offline_mce.mce_log[offline_mce.head];
> + offline_mce.head = (offline_mce.head + 1) % OFFLINE_CPU_LOG_LEN;
> +
> + ret = 1;
> +out:
> + return ret;
> +}

One more buffer for MCEs? Why?

We did add the mce_gen_pool thing exactly for logging stuff in atomic
context. From looking at the code, we probably could get rid of that
"struct mce_log mcelog" thing too and use only the gen_pool for logging
MCEs.

We can then get rid of that MCE_LOG_LEN arbitrary 32 records and use
a nice 2-paged buffer which can be enlarged transparently later, if
needed.

Hmmm?

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/