[PATCH 12/24] x86/traps: Reconstruct pt_regs on task stack directly in fixup_bad_iret()

From: Lai Jiangshan
Date: Tue Aug 31 2021 - 13:51:48 EST


From: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>

In current code for handing bad iret, pt_regs is moved several times.
Reconstruct pt_regs from bad_regs onto a tmp
Copy tmp pt_regs onto entry stack
Copy the pt_regs on the entry stack to the task stack in sync_regs().

This patch directly reconstructs pt_regs from bad_regs onto the task stack.

Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
---
arch/x86/entry/traps.c | 35 +++++++++++++++++++----------------
1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/arch/x86/entry/traps.c b/arch/x86/entry/traps.c
index b8fdf6a9682f..ab9866b650e7 100644
--- a/arch/x86/entry/traps.c
+++ b/arch/x86/entry/traps.c
@@ -740,25 +740,28 @@ struct pt_regs *fixup_bad_iret(struct pt_regs *bad_regs)
/*
* This is called from entry_64.S early in handling a fault
* caused by a bad iret to user mode. To handle the fault
- * correctly, we want to move our stack frame to where it would
- * be had we entered directly on the entry stack (rather than
- * just below the IRET frame) and we want to pretend that the
- * exception came from the IRET target.
+ * correctly, we want to pretend that the exception came from
+ * the IRET target (userspace) and reconstruct the orginal
+ * pt_regs from the bad regs on the task stack and switch to
+ * the task stack to handle it. The actual stack switch is
+ * done in entry_64.S
*/
- struct pt_regs tmp, *new_stack =
- (struct pt_regs *)__this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
-
- /* Copy the IRET target to the temporary storage. */
- __memcpy(&tmp.ip, (void *)bad_regs->sp, 5*8);
+ struct pt_regs *regs = (struct pt_regs *)this_cpu_read(cpu_current_top_of_stack) - 1;

- /* Copy the remainder of the stack from the current stack. */
- __memcpy(&tmp, bad_regs, offsetof(struct pt_regs, ip));
+ /*
+ * Copy the IRET target to the task pt_regs. This fault caused
+ * by a bad iret on native_irq_return_iret which is not used by
+ * XENPV, so the bad_regs->sp and bad_regs are in the CPU ENTRY
+ * AREA and they must not overlap with the task pt_regs and
+ * we can safely use __memcpy().
+ */
+ __memcpy(&regs->ip, (void *)bad_regs->sp, 5*8);

- /* Update the entry stack */
- __memcpy(new_stack, &tmp, sizeof(tmp));
+ /* Copy the remainder of the pt_regs from the bad_regs. */
+ __memcpy(regs, bad_regs, offsetof(struct pt_regs, ip));

- BUG_ON(!user_mode(new_stack));
- return new_stack;
+ BUG_ON(!user_mode(regs));
+ return regs;
}

#ifdef CONFIG_PAGE_TABLE_ISOLATION
@@ -861,7 +864,7 @@ struct pt_regs *do_error_entry(struct pt_regs *eregs)
* pt_regs as if we faulted immediately after IRET and put
* pt_regs onto the real task stack.
*/
- return sync_regs(fixup_bad_iret(eregs));
+ return fixup_bad_iret(eregs);
}

/*
--
2.19.1.6.gb485710b