Re: [PATCH 4/6] x86-mce: Add spinlocks to prevent duplicated MCP and CMCI reports.

From: Havard Skinnemoen
Date: Fri Jul 11 2014 - 17:15:53 EST

On Fri, Jul 11, 2014 at 12:52 PM, Borislav Petkov <bp@xxxxxxxxx> wrote:
> On Fri, Jul 11, 2014 at 12:06:40PM -0700, Tony Luck wrote:
>> > + if (atomic_add_unless(&mce_banks[i].poll_reader, 1, 1)) {
>> > + m.status = mce_rdmsrl(MSR_IA32_MCx_STATUS(i));
>> Same as yesterday. You may skip reading a bank because someone else
>> is reading the same bank number, even though you don't share that bank
>> with them.
> Not if those banks are in a percpu variable. And this is what
> machine_check_poll gets. The ->poll_reader thing is then per-cpu too.
> For shared banks it should work also as expected since we want there
> only one reader to see the MCE signature.

If poll_reader is per-cpu, what will prevent multiple CPUs from
reading the same error from a shared bank?

>> If we are willing to be rather flexible amount when polling happens,
>> and not allow very fast poll rates. Then we could do something like
>> have the lowest numbered online cpu be the only one that sets a
>> timer. When it goes off, it scans its own banks, and then uses an
>> async cross-processor call to poke the next highest numbered
>> online cpu to have it scan banks and poke the next guy.
>> That way we know that two cpus can't be polling at the same time,
>> because we convoy them all one at a time.
> See above - those banks are percpu. And besides, mce_timer_fn already
> has the WARN_ON which otherwise be firing left and right.

mce_banks isn't currently percpu.

> It seems, Havard's issue is only with shared banks. I think they only
> cause the repeated error records.

But the problem with shared banks is that multiple CPUs read them, and
we can't tell if the errors are duplicate unless we make sure that
only one CPU gets to read and clear it at a time. Any cpu-local
synchronization mechanism isn't going to work.

If you're worried about disabling interrupts, it's possible we don't
really need to make the spinlocks irqsafe. I'm not sure if we had any
reason for that other than "just to be safe".

Or we could keep the (irqsafe) spinlocks but move the clearing much
earlier. There may have been a reason why the current code clears the
bank status last though -- perhaps we also need to read out all the
state while we hold the lock, before we clear the status bit.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at