> I figured it out!
>
> GCC assumes that the stack is 16-byte aligned **before** the call
> instruction. Since call pushes rip to the stack, GCC will compile code
> assuming that on entrance to the function, the stack is -8 from a
> 16-byte aligned address.
>
> Since for TDs we do a ljmp to guest code, providing a function's
> address, the stack was not modified by a call instruction pushing rip to
> the stack, so the stack is 16-byte aligned when the guest code starts
> running, instead of 16-byte aligned -8 that GCC expects.
>
> For VMs, we set rip to a function pointer, and the VM starts running
> with a 16-byte algined stack too.
>
> To fix this, I propose that in vm_arch_vcpu_add(), we align the
> allocated stack address and then subtract 8 from that:
>
> @@ -573,10 +573,13 @@ struct kvm_vcpu *vm_arch_vcpu_add(struct kvm_vm *vm,
> uint32_t vcpu_id,
> vcpu_init_cpuid(vcpu, kvm_get_supported_cpuid());
> vcpu_setup(vm, vcpu);
>
> + stack_vaddr += (DEFAULT_STACK_PGS * getpagesize());
> + stack_vaddr = ALIGN_DOWN(stack_vaddr, 16) - 8;
The ALIGN_DOWN should be unnecessary, we've got larger issues if getpagesize() isn't
16-byte aligned and/or if __vm_vaddr_alloc() returns anything but a page-aligned
address. Maybe add a TEST_ASSERT() sanity check that stack_vaddr is page-aligned
at this point?
And in addition to the comment suggested by Maciej, can you also add a comment
explaining the -8 adjust? Yeah, someone can go read the changelog, but I think
this is worth explicitly documenting in code.
Lastly, can you post it as a standalone patch?
Many thanks!