Re: pt_regs->ax == -ENOSYS

From: Andy Lutomirski
Date: Tue Apr 27 2021 - 19:23:22 EST


On Tue, Apr 27, 2021 at 3:58 PM H. Peter Anvin <hpa@xxxxxxxxx> wrote:
>
> On 4/27/21 2:28 PM, Andy Lutomirski wrote:
> >
> >> On Apr 27, 2021, at 2:15 PM, H. Peter Anvin <hpa@xxxxxxxxx> wrote:
> >>
> >> Trying to stomp out some possible cargo cult programming?
> >>
> >> In the process of going through the various entry code paths, I have to admit to being a bit confused why pt_regs->ax is set to -ENOSYS very early in the system call path.
> >>
> >
> > It has to get set to _something_, and copying orig_ax seems perhaps silly. There could also be code that relies on ptrace poking -1 into the nr resulting in -ENOSYS.
> >
>
> Yeah. I obviously ran into this working on the common entry-exit code
> for FRED; the frame has annoyingly different formats because of this,
> and I wanted to avoid slowing down the system call path.
>
> >> What is perhaps even more confusing is:
> >>
> >> __visible noinstr void do_syscall_64(struct pt_regs *regs, unsigned long nr)
> >> {
> >> nr = syscall_enter_from_user_mode(regs, nr);
> >>
> >> instrumentation_begin();
> >> if (likely(nr < NR_syscalls)) {
> >> nr = array_index_nospec(nr, NR_syscalls);
> >> regs->ax = sys_call_table[nr](regs);
> >> #ifdef CONFIG_X86_X32_ABI
> >> } else if (likely((nr & __X32_SYSCALL_BIT) &&
> >> (nr & ~__X32_SYSCALL_BIT) < X32_NR_syscalls)) {
> >> nr = array_index_nospec(nr & ~__X32_SYSCALL_BIT,
> >> X32_NR_syscalls);
> >> regs->ax = x32_sys_call_table[nr](regs);
> >> #endif
> >> }
> >> instrumentation_end();
> >> syscall_exit_to_user_mode(regs);
> >> }
> >> #endif
> >>
> >> Now, unless I'm completely out to sea, it seems to me that if syscall_enter_from_user_mode() changes the system call number to an invalid number and pt_regs->ax to !-ENOSYS then the system call will return a different value(!) depending on if it is out of range for the table (whatever was poked into pt_regs->ax) or if it corresponds to a hole in the table. This seems to me at least to be The Wrong Thing.
> >
> > I think you’re right.
> >
> >>
> >> Calling regs->ax = sys_ni_syscall() in an else clause would arguably be the right thing here, except possibly in the case where nr (or (int)nr, see below) == -1 or < 0.
> >
> > I think the check should be -1 for 64 bit but (u32)nr == (u32)-1 for the 32-bit path. Does that seem reasonable?
>
> I'm thinking overall that depending on 64-bit %rax is once again a
> mistake; I realize that the assembly code that did that kept breaking
> because people messed with it, but we still have:
>
> /*
> * Only the low 32 bits of orig_ax are meaningful, so we return int.
> * This importantly ignores the high bits on 64-bit, so comparisons
> * sign-extend the low 32 bits.
> */
> static inline int syscall_get_nr(struct task_struct *task, struct
> pt_regs *regs)
> {
> return regs->orig_ax;
> }
>
> "Different interpretation of the same data" is a notorious security
> trap. Zero-extending orig_ax would cause different behavior on 32 and 64
> bits and differ from the above, so I'm thinking that just once and for
> all defining the system call number as a signed int for all the x86 ABIs
> would be the sanest.
>
> It still doesn't really answer the question if "movq $-1,%rax; syscall"
> or "movl $-1,%eax; syscall" could somehow cause bad things to happen,
> though, which makes me a little bit nervous still.
>

I much prefer the model of saying that the bits that make sense for
the syscall type (all 64 for 64-bit SYSCALL and the low 32 for
everything else) are all valid. This way there are no weird reserved
bits, no weird ptrace() interactions, etc. I'm a tiny bit concerned
that this would result in a backwards compatibility issue, but not
very. This would involve changing syscall_get_nr(), but that doesn't
seem so bad. The biggest problem is that seccomp hardcoded syscall
nrs to 32 bit.

An alternative would be to declare that we always truncate to 32 bits,
except that 64-bit SYSCALL with high bits set is an error and results
in ENOSYS. The ptrace interaction there is potentially nasty.

Basically, all choices here kind of suck, and I haven't done a real
analysis of all the issues...

--Andy