Re: Compat syscall instrumentation and return from execve issue

From: Andy Lutomirski
Date: Mon Nov 09 2015 - 20:51:49 EST

On Mon, Nov 9, 2015 at 1:12 PM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> On Mon, 9 Nov 2015 12:57:06 -0800
> Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>> > The solution I suggested wouldn't touch any asm code. The only change
>> > would be to reserve the TS_EXECVE flag. Actually, come to think of it,
>> > we could have Mathieu's TS_ORIG_COMPAT flag, and still only have the
>> > tracepoint syscall set it, such that the matching tracepoint syscall
>> > exit would know that the initial call was COMPAT or not.
>> Someone needs to clear TS_EXECVE, though.
> Well, it gets set and cleared by the syscall enter (same for
> TS_ORIG_COMPAT), and exit for that matter.
> It's trivial to have a tracepoint hook added when either system call
> enter or exit tracepoints are enabled. Thus, the setting and clearing of
> the flag can be done by another callback at those tracepoints.
>> >
>> > The goal is only to make sure that the system call exit tracepoint
>> > matches the system call enter tracepoint.
>> >
>> > The system call enter would set or clear the TS_ORIG_COMPAT if the
>> > TS_COMPAT is set when entering the system call, and it would check that
>> > flag when exiting the system call.
>> This seems a bit odd, though, since we aren't very good about
>> preserving the syscall nr or the args through syscall processing. In
>> any event, in the new improved x86 syscall code, we know what arch we
>> are just by following the control flow, so no flags should be needed.
>> Hence my suggestion of just adding an "unsigned int arch" to the
>> return slowpath.
> I guess I don't understand this "unsigned int arch".
> When the execve system call is called, it's running in x86_64 mode, and
> then the execve changes the state to ia32 bit mode. Then on return, the
> tracepoint system call exit, has the x86_64 system call number, but if
> it checks to see what state the task is in, it will see ia32 state, and
> then report the number for ia32 instead.
> For example, in x86_64, execve is 59, and that number is passed to the
> system call enter tracepoint. Now on return of the system call, the
> system call exit tracepoint gets called with 59 as the system call as
> well, but if that tracepoint checks the state, it will think its
> returning the "olduname" system call (that's 59 for ia32).
> What change are you making to solve this?

do_syscall_32_irqs_on would call syscall_return_slowpath(regs,
AUDIT_ARCH_I386). do_syscall_64 (which doesn't exist yet) would call
syscall_return_slowpath(regs, AUDIT_ARCH_X86_64).

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at