[PATCH] x86/paravirt: Match paravirt patchlet field definition ordering to initialization ordering

From: Ingo Molnar
Date: Thu Apr 25 2019 - 04:10:20 EST



* Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:

> On Thu, 25 Apr 2019, Ingo Molnar wrote:
> > > +# else
> > > + .irq_restore_fl = { 0x50, 0x9d }, // push %eax; popf
> > > + .mmu_write_cr3 = { 0x0f, 0x22, 0xd8 }, // mov %eax, %cr3
> > > + .cpu_iret = { 0xcf }, // iret
> > > +# endif
> >
> > I think these open-coded hexa versions are somewhat fragile as well, how
> > about putting these into a .S file and controlling the sections in an LTO
> > safe manner there?
> >
> > That will also allow us to write proper asm, and global labels can be
> > used to extract the patchlets and their length?
>
> We are not changing these any other day and I really don't see a reason to
> have these things global just because.

Firstly, I'm not sure I understand the objection to using symbols: in an
average distro kernel we have more than 75,000 symbols - as long as the
namespace is clear they are there to be used, there's nothing wrong with
them per se.

Secondly, my main point is that putting these code patching sequences
into the .S it's easy to verify that the instructions and patchlets are
what we claim them to be.

Right now they are randomly ordered data tables that don't match to the
source code.

I know, because I tried. :-)

Here's the objdump -D output of the PATCH_XXL data table:

0000000000000010 <patch_data_xxl>:
10: fa cli
11: fb sti
12: 57 push %rdi
13: 9d popfq
14: 9c pushfq
15: 58 pop %rax
16: 0f 20 d0 mov %cr2,%rax
19: 0f 20 d8 mov %cr3,%rax
1c: 0f 22 df mov %rdi,%cr3
1f: 0f 09 wbinvd
21: 0f 01 f8 swapgs
24: 48 0f 07 sysretq
27: 0f 01 f8 swapgs
2a: 48 89 f8 mov %rdi,%rax

Note how this doesn't match up to the source code:

static const struct patch_xxl patch_data_xxl = {
.irq_irq_disable = { 0xfa }, // cli
.irq_irq_enable = { 0xfb }, // sti
.irq_save_fl = { 0x9c, 0x58 }, // pushf; pop %[re]ax
.mmu_read_cr2 = { 0x0f, 0x20, 0xd0 }, // mov %cr2, %[re]ax
.mmu_read_cr3 = { 0x0f, 0x20, 0xd8 }, // mov %cr3, %[re]ax
# ifdef CONFIG_X86_64
.irq_restore_fl = { 0x57, 0x9d }, // push %rdi; popfq
.mmu_write_cr3 = { 0x0f, 0x22, 0xdf }, // mov %rdi, %cr3
.cpu_wbinvd = { 0x0f, 0x09 }, // wbinvd
.cpu_usergs_sysret64 = { 0x0f, 0x01, 0xf8,
0x48, 0x0f, 0x07 }, // swapgs; sysretq
.cpu_swapgs = { 0x0f, 0x01, 0xf8 }, // swapgs
.mov64 = { 0x48, 0x89, 0xf8 }, // mov %rdi, %rax
# else
.irq_restore_fl = { 0x50, 0x9d }, // push %eax; popf
.mmu_write_cr3 = { 0x0f, 0x22, 0xd8 }, // mov %eax, %cr3
.cpu_iret = { 0xcf }, // iret
# endif
};

Note how they are reordered: in the generated code .irq_restore_fl comes
before .irq_save_fl, etc. This is because the field ordering in struct
patch_xxl does not match the initialization ordering of patch_data_xxl.

Third, beyond readability there's another advantage of my suggested
approach as well: for example that way we could verify the passed in
length with the patchlet length. Right now it's completely unverified:

case PARAVIRT_PATCH(ops.m): \
return PATCH(data, ops##_##m, ibuf, len)

right now we don't check whether the 'len' passed in by the usage site
matches the actual structure field length.

Although maybe we could do that with your C space structure as well.

Anyway, no strong feelings and I didn't want to create extra work for you
- but I think at minimum we should do the patch below, which matches up
the initialization order with the definition order - this at least makes
the disassembly reviewable:

0000000000000010 <patch_data_xxl>:
10: fa cli
11: fb sti
12: 9c pushfq
13: 58 pop %rax
14: 0f 20 d0 mov %cr2,%rax
17: 0f 20 d8 mov %cr3,%rax
1a: 0f 22 df mov %rdi,%cr3
1d: 57 push %rdi
1e: 9d popfq
1f: 0f 09 wbinvd
21: 0f 01 f8 swapgs
24: 48 0f 07 sysretq
27: 0f 01 f8 swapgs
2a: 48 89 f8 mov %rdi,%rax

And yes, with that applied it verifies 100%. :-)

Thanks,

Ingo

Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>

--- tip.orig/arch/x86/kernel/paravirt_patch.c
+++ tip/arch/x86/kernel/paravirt_patch.c
@@ -21,11 +21,11 @@
struct patch_xxl {
const unsigned char irq_irq_disable[1];
const unsigned char irq_irq_enable[1];
- const unsigned char irq_restore_fl[2];
const unsigned char irq_save_fl[2];
const unsigned char mmu_read_cr2[3];
const unsigned char mmu_read_cr3[3];
const unsigned char mmu_write_cr3[3];
+ const unsigned char irq_restore_fl[2];
# ifdef CONFIG_X86_64
const unsigned char cpu_wbinvd[2];
const unsigned char cpu_usergs_sysret64[6];
@@ -43,16 +43,16 @@ static const struct patch_xxl patch_data
.mmu_read_cr2 = { 0x0f, 0x20, 0xd0 }, // mov %cr2, %[re]ax
.mmu_read_cr3 = { 0x0f, 0x20, 0xd8 }, // mov %cr3, %[re]ax
# ifdef CONFIG_X86_64
- .irq_restore_fl = { 0x57, 0x9d }, // push %rdi; popfq
.mmu_write_cr3 = { 0x0f, 0x22, 0xdf }, // mov %rdi, %cr3
+ .irq_restore_fl = { 0x57, 0x9d }, // push %rdi; popfq
.cpu_wbinvd = { 0x0f, 0x09 }, // wbinvd
.cpu_usergs_sysret64 = { 0x0f, 0x01, 0xf8,
0x48, 0x0f, 0x07 }, // swapgs; sysretq
.cpu_swapgs = { 0x0f, 0x01, 0xf8 }, // swapgs
.mov64 = { 0x48, 0x89, 0xf8 }, // mov %rdi, %rax
# else
- .irq_restore_fl = { 0x50, 0x9d }, // push %eax; popf
.mmu_write_cr3 = { 0x0f, 0x22, 0xd8 }, // mov %eax, %cr3
+ .irq_restore_fl = { 0x50, 0x9d }, // push %eax; popf
.cpu_iret = { 0xcf }, // iret
# endif
};