Re: [Patch V1 1/3] x86, mce: MCE log size not enough for high core parts

From: Borislav Petkov
Date: Thu Sep 24 2015 - 14:52:55 EST


On Thu, Sep 24, 2015 at 06:44:25PM +0000, Luck, Tony wrote:
> > Now that we have this shiny 2-pages sized lockless gen_pool, why are we
> > still dealing with struct mce_log mcelog? Why can't we rip it out and
> > kill it finally? And switch to the gen_pool?
> >
> > All code that reads from mcelog - /dev/mcelog chrdev - should switch to
> > the lockless buffer and will iterate through the logged MCEs there.
>
> I think we have a problem of when to delete entries ... we can only do that
> when all the interested consumers of logs have seen an entry. But we have
> no control in the kernel on consumption from /dev/mcelog.
>
> Historic semantics was that the first MCE_LOG_LEN errors would sit
> in the buffer waiting for userspace to begin running a daemon to read
> them.

Right, we can tag them with various flags when iterating over them in
the gen_pool. The in-kernel consumers can look at them, modify, update
the information, etc.

Userspace can then consume them and delete them.

If we get new ones logged in the meantime and userspace hasn't managed
to consume and delete the present ones yet, we overwrite the oldest ones
and set MCE_OVERFLOW like mce_log does now for mcelog. And that's no
difference in functionality than what we have now.

The advantage is that we get in-kernel consumers to look at them first
and we keep all MCE records concentrated in one place.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/