Re: [PATCH v2 5/6] x86/xen: Add a Xen-specific sync_core() implementation

From: Borislav Petkov
Date: Fri Dec 02 2016 - 14:21:07 EST


On Fri, Dec 02, 2016 at 11:03:50AM -0800, Linus Torvalds wrote:
> I'd really rather rjust mark it noinline with a comment. That way the
> return from the function acts as the control flow change.

Something like below?

It boots in a guest but that doesn't mean anything.

> 'sync_core()' doesn't help for other CPU's anyway, you need to do the
> cross-call IPI. So worrying about other CPU's is *not* a valid reason
> to keep a "sync_core()" call.

Yeah, no, I'm not crazy about it either - I was just sanity-checking all
call sites of apply_alternatives(). But as you say, we would've gotten
much bigger problems if other CPUs would walk in there on us.

> Seriously, the only reason I can see for "sync_core()" really is:
>
> - some deep non-serialized MSR access or similar (ie things like
> firmware loading etc really might want it, and a mchine check might
> want it)

Yah, we do it in the #MC handler - apparently we need it there - and
in the microcode loader to tickle out the version of the microcode
currently applied into the MSR.

> The issues with modifying code while another CPU may be just about to
> access it is a separate issue. And as noted, "sync_core()" is not
> sufficient for that, you have to do a whole careful dance with
> single-byte debug instruction writes and then a final cross-call.
>
> See the whole "text_poke_bp()" and "text_poke()" for *that* whole
> dance. That's a much more complex thing from the normal
> apply_alternatives().

Yeah.

---
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 5cb272a7a5a3..b1d0c35e6dcb 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -346,7 +346,6 @@ static void __init_or_module optimize_nops(struct alt_instr *a, u8 *instr)

local_irq_save(flags);
add_nops(instr + (a->instrlen - a->padlen), a->padlen);
- sync_core();
local_irq_restore(flags);

DUMP_BYTES(instr, a->instrlen, "%p: [%d:%d) optimized NOPs: ",
@@ -359,9 +358,12 @@ static void __init_or_module optimize_nops(struct alt_instr *a, u8 *instr)
* This implies that asymmetric systems where APs have less capabilities than
* the boot processor are not handled. Tough. Make sure you disable such
* features by hand.
+ *
+ * Marked "noinline" to cause control flow change and thus insn cache
+ * to refetch changed I$ lines.
*/
-void __init_or_module apply_alternatives(struct alt_instr *start,
- struct alt_instr *end)
+void __init_or_module noinline apply_alternatives(struct alt_instr *start,
+ struct alt_instr *end)
{
struct alt_instr *a;
u8 *instr, *replacement;
@@ -667,7 +669,6 @@ void *__init_or_module text_poke_early(void *addr, const void *opcode,
unsigned long flags;
local_irq_save(flags);
memcpy(addr, opcode, len);
- sync_core();
local_irq_restore(flags);
/* Could also do a CLFLUSH here to speed up CPU recovery; but
that causes hangs on some VIA CPUs. */

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.