Re: [PATCH] Fix x86_64 TIF_SYSCALL_TRACE race in entry.S

From: Mathieu Desnoyers
Date: Sun Oct 28 2007 - 17:17:21 EST


* Andi Kleen (andi@xxxxxxxxxxxxxx) wrote:
> Mathieu Desnoyers <compudj@xxxxxxxxxxxxxxxxxx> writes:
>
> > We make sure that the thread flag read is coherent between our new test and the ALLWORK_MASK test by first saving it in a register used for both comparisons.
> >
>
> That doesn't make sense. If someone is setting those asynchronously you
> can always race.
>

Setting the thread flag being an atomic operation, I would expect
setting/clearing it asynchronously from another thread to be a valid
behavior. The only race that I foresee happens if the code that uses the
thread flag reads it more than once and expects it to stay unchanged
between the reads.

> You should really just stop the process like ptrace does before changing
> such things.
>

Iterating on each thread running on the system and stopping them when
we start kernel tracing seems to have the same impact as throwing a
brick in a quiet lake. :) I would prefer not to do that if we can do
otherwise.

Here is a modified version where I add my test only in the path where we
know that we have work to do, therefore removing the supplementary test
from the performance critical path. Would it be more acceptable ?



Fix x86_64 TIF_SYSCALL_TRACE race in entry.S

When the flag is inactive upon syscall entry and concurrently activated before
exit, we seem to reach a state where the top of stack is incorrect upon return
to user space.

Fix this by fixing the top of stack and jumping to int_ret_from_sys_call if we
detect that thread flags has been modified.

We make sure that the thread flag read is coherent between our new test and the ALLWORK_MASK test by first saving it in a register used for both comparisons.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxx>
CC: Andi Kleen <ak@xxxxxx>
---
arch/x86_64/kernel/entry.S | 12 ++++++++++++
1 file changed, 12 insertions(+)

Index: linux-2.6-lttng/arch/x86_64/kernel/entry.S
===================================================================
--- linux-2.6-lttng.orig/arch/x86_64/kernel/entry.S 2007-10-27 14:01:12.000000000 -0400
+++ linux-2.6-lttng/arch/x86_64/kernel/entry.S 2007-10-28 16:33:56.000000000 -0400
@@ -267,6 +267,8 @@ sysret_check:
/* Handle reschedules */
/* edx: work, edi: workmask */
sysret_careful:
+ testl $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SECCOMP),%edx
+ jnz ret_from_sys_call_trace
bt $TIF_NEED_RESCHED,%edx
jnc sysret_signal
TRACE_IRQS_ON
@@ -278,6 +280,16 @@ sysret_careful:
CFI_ADJUST_CFA_OFFSET -8
jmp sysret_check

+ret_from_sys_call_trace:
+ TRACE_IRQS_ON
+ sti
+ SAVE_REST
+ FIXUP_TOP_OF_STACK %rdi
+ movq %rsp,%rdi
+ LOAD_ARGS ARGOFFSET /* reload args from stack in case ptrace changed it */
+ RESTORE_REST
+ jmp int_ret_from_sys_call
+
/* Handle a signal */
sysret_signal:
TRACE_IRQS_ON


--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/