Re: [Regression v4.2 ?] 32-bit seccomp-BPF returned errno values wrong in VM?

From: Denys Vlasenko
Date: Fri Aug 14 2015 - 07:20:35 EST

On 08/14/2015 12:27 AM, Linus Torvalds wrote:
> On Thu, Aug 13, 2015 at 2:35 PM, Denys Vlasenko <dvlasenk@xxxxxxxxxx> wrote:
>> I suspect this change:
>> .macro auditsys_entry_common
>> ...
>> movl %ebx,%esi /* 2nd arg: 1st syscall arg */
>> movl %eax,%edi /* 1st arg: syscall number */
>> call __audit_syscall_entry
>> - movl RAX(%rsp),%eax /* reload syscall number */
>> - cmpq $(IA32_NR_syscalls-1),%rax
>> - ja ia32_badsys
>> + movl ORIG_RAX(%rsp),%eax /* reload syscall number */
>> We were reloading syscall# from pt_regs->ax.
>> After the patch, pt_regs->ax isn't equal to syscall# on entry,
>> instead it contains -ENOSYS. Therefore the change shown above
>> was made, to reload it from pt_regs->orig_ax.
>> Well. This still should work... in fact it is "more correct"
>> than it was before...
> Well, since it doesn't work, that's clearly not the case.
> Also, you do realize that ORIG_RAX can get changed by signal handling
> and ptrace?

I am very aware of that, yes. If it changes, we should use *it*.
That's why new code in this part is "more correct" than old one.

> In fact, I think that whole "save -ENOSYS to pt_regs->ax" is BS.
> Exactly because we use pt_regs->ax for ptrace etc, and you've changed
> the register state we expose.

ptrace always sees pt_regs->ax = -ENOSYS on syscall entry.
That's part of the ABI. Syscall# is in pt_regs->orig_ax.

We used to do that _only_ on ptrace code path, the fast path
didn't store -ENOSYS in pt_regs->ax. This optimization ended up being
more pain than gain, and it was changed for 64-bit code by this commit:

commit 54eea9957f5763dd1a2555d7e4cb53b4dd389cc6
Author: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
Date: Fri Sep 5 15:13:55 2014 -0700

x86_64, entry: Treat regs->ax the same in fastpath and slowpath syscalls

I changed 32-bit compat code to do the same thing.

> I'm also going to be a *lot* less inclined to take all these idiotic
> low-level x86 changes from now on. It's been a total pain, for very
> little gain. These "cleanups" have been buggy as hell, and test
> coverage for the compat case is clearly lacking.

The code in question was an unholy mess, with about a half dozen
"clever optimizations" tangled together. Now it is much, much
less ugly.

It was nearly inevitable that something would break during untangling.

I understand the frustration of having things breaking
"because of the stupid cleanup".

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at