Re: [RFC] Avoid PIT SMP lockups

From: Zachary Amsden
Date: Mon Oct 16 2006 - 19:25:54 EST


Andi Kleen wrote:
It might only happen with SMP because the difficulty of getting good enough TSC / timer IRQ synchronization during boot increases exponentially with SMP configurations. And it might pass 10% of the time because you were lucky enough not to fire off another timer interrupt yet.

We have the same problem with NMI watchdog events unfortunately. Need to call something in the nmi watchdog code to make sure it is not renewed and then reenabled.
Or maybe it's better to figure out a way that yields atomic patches.

I think the best way is to make sure all alternative() patches
are always done before the code can be ever executed - this
means doing it very early for the main kernel. The only exception
would be the LOCK prefix patching, which should be atomic

Yes, this solves the problem in most cases. Lock patching is fine no matter when you do it. I think the problem with alternative patching in check_bugs() is that it happens way too late; the patching really has nothing at all to do with check_bugs(), and should be a separate step, probably part of setup_arch.


The paravirt-ops stuff also has some patching code. Fortunately, there, we can probably skirt the NMI issue by simply disallowing NMIs, but the issue pops up again in stop_machine_run - what happens if you take NMIs during stop_machine_run? Debug traps? Module unload is fine, but code patching done using stop_machine_run is not safe.

Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/