Re: [Patch V1 1/3] x86, mce: MCE log size not enough for high core parts
From: Raj, Ashok
Date: Thu Sep 24 2015 - 16:22:47 EST
On Thu, Sep 24, 2015 at 09:22:24PM +0200, Borislav Petkov wrote:
> Ah, we return. But we shouldn't return - we should overwrite. I believe
> we've talked about the policy of overwriting old errors with new ones.
Another reason i had a separate buffer in my earlier patch was to avoid
calling rcu() functions from the offline CPU. I had an offline discussion
with Paul McKenney he said don't do that...
mce_gen_pool_add()->gen_pool_alloc() which calls rcu_read_lock() and such.
So it didn't seem approprite.
Also the function doesn't seem safe to be called in NMI context. Although
MCE is different, for all intentional purposes we should treat both as same
priority. The old style log is simple and tested in those cases.
I like everything you say below... something we could do as our next phase
of improving logging and might need more careful work to build it right.
just like how MC banks have overwrite rules, we can possibly do something
like that if the buffer fills up.
> TBH, I don't think there's a 100%-correct policy to act according to
> when our error logging buffers are full:
> - we can overwrite old errors with new but then this way we might lose
> the one important error record with which it all started.
> - if we don't overwrite, we might fill up with "unimportant" correctable
> error records and miss other, more important ones which happen now
> - ...
> We could try to implement some cheap heuristics which decide what and
> when to overwrite but I'm sceptical it'll be always correct...
> ECO tip #101: Trim your mails when you reply.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/