Re: [RFC][PATCH] x86: Make x32 syscall support conditional on a kernel parameter
From: Ben Hutchings
Date: Sun Dec 07 2014 - 17:42:01 EST
On Thu, 2014-11-06 at 14:46 -0800, Andy Lutomirski wrote:
> On 11/05/2014 07:53 PM, Ben Hutchings wrote:
> > Enabling x32 in a distribution default kernel increases its attack
> > surface while providing no benefit to the vast majority of its users.
> > No-one seems interested in regularly checking for vulnerabilities
> > specific to x32 (at least no-one with a white hat).
> >
> > Still, requiring a separate or custom configuration just to turn on
> > x32 seems wasteful. And the only differences on syscall entry are
> > two instructions (mask out the x32 flag and compare the syscall
> > number).
> >
> > So pad the standard comparison with a nop and add a kernel parameter
> > "syscall.x32" which controls whether this is replaced with the x32
> > version at boot time. Add a Kconfig parameter to set the default.
[...]
> > --- a/arch/x86/kernel/entry_64.S
> > +++ b/arch/x86/kernel/entry_64.S
> > @@ -414,8 +414,12 @@ system_call_fastpath:
> > #if __SYSCALL_MASK == ~0
> > cmpq $__NR_syscall_max,%rax
> > #else
> > - andl $__SYSCALL_MASK,%eax
> > - cmpl $__NR_syscall_max,%eax
> > + .globl system_call_fast_compare
> > + .globl system_call_fast_compare_end
> > +system_call_fast_compare:
> > + cmpq $511,%rax /* x32 syscalls start at 512 */
> > + .byte P6_NOP4
> > +system_call_fast_compare_end:
> > #endif
> > ja ret_from_sys_call /* and return regs->ax */
> > movq %r10,%rcx
> > @@ -520,8 +524,12 @@ tracesys_phase2:
> > #if __SYSCALL_MASK == ~0
> > cmpq $__NR_syscall_max,%rax
> > #else
> > - andl $__SYSCALL_MASK,%eax
> > - cmpl $__NR_syscall_max,%eax
> > + .globl system_call_trace_compare
> > + .globl system_call_trace_compare_end
> > +system_call_trace_compare:
> > + cmpq $511,%rax /* x32 syscalls start at 512 */
> > + .byte P6_NOP4
> > +system_call_trace_compare_end:
>
> I think that, if x32 is disabled, that andl $__SYSCALL_MASK,%rax should
> be nopped out.
Yes, that's what happens.
> Can this, and the comparison, use the standard alternatives mechanism?
There isn't really a standard alternatives mechanism! There is a CPU-
features alternatives mechanism, an SMP alternatives mechanism and a PV
alternatives mechanism, but they don't share anything above the
text_poke*() functions.
> And why are there extra syscall numbers in x32 land? There's an x32 bit
> *and* extra numbers. I'm confused.
I think the extra numbers are assigned for system calls where adjustment
is always needed on entry and exit, while the x32 bit is used to trigger
compat handling if it's needed at a deeper level. is_compat_task()
tests the latter.
> That magic number is bad. I would make it a real #define and put an
> appropriate assertion into whatever generates syscalls_64.h.
>
> Also, if you do this, can you change all four instances of %eax in there
> to %rax? %eax is just wrong AFAICT.
I can only guess that %eax is currently used because it makes the code
slightly smaller (no REX prefix needed).
> Also, as a heads up to anyone who actually uses this: I was confused
> about the audit situation on x32-enabled kernels. I thought that this
> patch had been applied:
>
> https://www.redhat.com/archives/linux-audit/2014-May/msg00099.html
>
> but it wasn't. As a result, the commit message for this fix is wrong:
>
> 81f49a8fd708 x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit
>
> It *is* a user-visible fix and should possibly go to -stable, especially
> if anyone actually uses this *!$@ thing. Of course, audit is royally
> screwed up on x32-enabled kernels anyway. I admit that I find myself
> barely caring.
Thanks; this went into 3.16.7-ckt2 and should go into a Debian kernel
update soon.
Ben.
--
Ben Hutchings
Experience is directly proportional to the value of equipment destroyed.
- Carolyn Scheppner
Attachment:
signature.asc
Description: This is a digitally signed message part