Re: pt_regs->ax == -ENOSYS
From: Kees Cook
Date: Tue Apr 27 2021 - 19:32:12 EST
On Tue, Apr 27, 2021 at 03:58:03PM -0700, H. Peter Anvin wrote:
> On 4/27/21 2:28 PM, Andy Lutomirski wrote:
> >
> > > On Apr 27, 2021, at 2:15 PM, H. Peter Anvin <hpa@xxxxxxxxx> wrote:
> > >
> > > Trying to stomp out some possible cargo cult programming?
> > >
> > > In the process of going through the various entry code paths, I have to admit to being a bit confused why pt_regs->ax is set to -ENOSYS very early in the system call path.
> > >
> >
> > It has to get set to _something_, and copying orig_ax seems perhaps silly. There could also be code that relies on ptrace poking -1 into the nr resulting in -ENOSYS.
> >
>
> Yeah. I obviously ran into this working on the common entry-exit code for
> FRED; the frame has annoyingly different formats because of this, and I
> wanted to avoid slowing down the system call path.
>
> > > What is perhaps even more confusing is:
> > >
> > > __visible noinstr void do_syscall_64(struct pt_regs *regs, unsigned long nr)
> > > {
> > > nr = syscall_enter_from_user_mode(regs, nr);
> > >
> > > instrumentation_begin();
> > > if (likely(nr < NR_syscalls)) {
> > > nr = array_index_nospec(nr, NR_syscalls);
> > > regs->ax = sys_call_table[nr](regs);
> > > #ifdef CONFIG_X86_X32_ABI
> > > } else if (likely((nr & __X32_SYSCALL_BIT) &&
> > > (nr & ~__X32_SYSCALL_BIT) < X32_NR_syscalls)) {
> > > nr = array_index_nospec(nr & ~__X32_SYSCALL_BIT,
> > > X32_NR_syscalls);
> > > regs->ax = x32_sys_call_table[nr](regs);
> > > #endif
> > > }
> > > instrumentation_end();
> > > syscall_exit_to_user_mode(regs);
> > > }
> > > #endif
> > >
> > > Now, unless I'm completely out to sea, it seems to me that if syscall_enter_from_user_mode() changes the system call number to an invalid number and pt_regs->ax to !-ENOSYS then the system call will return a different value(!) depending on if it is out of range for the table (whatever was poked into pt_regs->ax) or if it corresponds to a hole in the table. This seems to me at least to be The Wrong Thing.
> >
> > I think you’re right.
> >
> > >
> > > Calling regs->ax = sys_ni_syscall() in an else clause would arguably be the right thing here, except possibly in the case where nr (or (int)nr, see below) == -1 or < 0.
> >
> > I think the check should be -1 for 64 bit but (u32)nr == (u32)-1 for the 32-bit path. Does that seem reasonable?
FWIW, there is some confusion with how syscall_trac_enter() signals the
"skip syscall" condition (-1L), vs actually calling "syscall -1". Right
now they're not really distinguished, and the early ENOSYS is there so that
"pretend it happened" can be implemented (by either ptrace or seccomp).
As in, "set return value to $whatever, and don't run a syscall".
> I'm thinking overall that depending on 64-bit %rax is once again a mistake;
> I realize that the assembly code that did that kept breaking because people
> messed with it, but we still have:
>
> /*
> * Only the low 32 bits of orig_ax are meaningful, so we return int.
> * This importantly ignores the high bits on 64-bit, so comparisons
> * sign-extend the low 32 bits.
> */
> static inline int syscall_get_nr(struct task_struct *task, struct pt_regs
> *regs)
> {
> return regs->orig_ax;
> }
>
> "Different interpretation of the same data" is a notorious security trap.
> Zero-extending orig_ax would cause different behavior on 32 and 64 bits and
> differ from the above, so I'm thinking that just once and for all defining
> the system call number as a signed int for all the x86 ABIs would be the
> sanest.
>
> It still doesn't really answer the question if "movq $-1,%rax; syscall" or
> "movl $-1,%eax; syscall" could somehow cause bad things to happen, though,
> which makes me a little bit nervous still.
I don't think this matters? What's the condition you're worried about
here? The syscall table lookup is going to be safe.
--
Kees Cook