Re: [PATCH v3 5/6] x86/alternative: Use a single access in text_poke() where possible

From: Nadav Amit
Date: Thu Jan 10 2019 - 04:33:23 EST


> On Jan 9, 2019, at 2:59 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>
> Static call inline patching will need to use single 32-bit writes.
> Change text_poke() to do so where possible.
>
> Signed-off-by: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
> ---
> arch/x86/kernel/alternative.c | 31 ++++++++++++++++++++++++++++---
> 1 file changed, 28 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index ebeac487a20c..607f48a90097 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -692,7 +692,7 @@ void *__init_or_module text_poke_early(void *addr, const void *opcode,
> void *text_poke(void *addr, const void *opcode, size_t len)
> {
> unsigned long flags;
> - char *vaddr;
> + unsigned long vaddr;
> struct page *pages[2];
> int i;
>
> @@ -714,14 +714,39 @@ void *text_poke(void *addr, const void *opcode, size_t len)
> }
> BUG_ON(!pages[0]);
> local_irq_save(flags);
> +
> set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> if (pages[1])
> set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
> - vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
> - memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> +
> + vaddr = fix_to_virt(FIX_TEXT_POKE0) + ((unsigned long)addr & ~PAGE_MASK);
> +
> + /*
> + * Use a single access where possible. Note that a single unaligned
> + * multi-byte write will not necessarily be atomic on x86-32, or if the
> + * address crosses a cache line boundary.
> + */
> + switch (len) {
> + case 1:
> + WRITE_ONCE(*(u8 *)vaddr, *(u8 *)opcode);
> + break;
> + case 2:
> + WRITE_ONCE(*(u16 *)vaddr, *(u16 *)opcode);
> + break;
> + case 4:
> + WRITE_ONCE(*(u32 *)vaddr, *(u32 *)opcode);
> + break;
> + case 8:
> + WRITE_ONCE(*(u64 *)vaddr, *(u64 *)opcode);
> + break;
> + default:
> + memcpy((void *)vaddr, opcode, len);
> + }
> +

Even if Intel and AMD CPUs are guaranteed to run instructions from L1
atomically, this may break instruction emulators, such as those that
hypervisors use. They might not read instructions atomically if on SMP VMs
when the VM's text_poke() races with the emulated instruction fetch.

While I can't find a reason for hypervisors to emulate this instruction,
smarter people might find ways to turn it into a security exploit.