Re: [PATCH 2/2] x86/ldt: allow to disable modify_ldt at runtime

From: Andy Lutomirski
Date: Mon Aug 03 2015 - 14:45:50 EST


On Mon, Aug 3, 2015 at 11:23 AM, Willy Tarreau <w@xxxxxx> wrote:
> For distros who prefer not to take the risk of completely disabling the
> modify_ldt syscall using CONFIG_MODIFY_LDT_SYSCALL, this patch adds a
> sysctl to enable, temporarily disable, or permanently disable it at
> runtime, and proposes to temporarily disable it by default. This can be
> a safe alternative. A message is logged if an attempt was stopped so that
> it's easy to spot if/when it is needed.
>
> Cc: Andy Lutomirski <luto@xxxxxxxxxx>
> Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> Signed-off-by: Willy Tarreau <w@xxxxxx>
> ---
> Documentation/sysctl/kernel.txt | 16 ++++++++++++++++
> arch/x86/Kconfig | 17 +++++++++++++++++
> arch/x86/kernel/ldt.c | 15 +++++++++++++++
> kernel/sysctl.c | 14 ++++++++++++++
> 4 files changed, 62 insertions(+)
>
> diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
> index 6fccb69..55648b9 100644
> --- a/Documentation/sysctl/kernel.txt
> +++ b/Documentation/sysctl/kernel.txt
> @@ -41,6 +41,7 @@ show up in /proc/sys/kernel:
> - kptr_restrict
> - kstack_depth_to_print [ X86 only ]
> - l2cr [ PPC only ]
> +- modify_ldt [ X86 only ]
> - modprobe ==> Documentation/debugging-modules.txt
> - modules_disabled
> - msg_next_id [ sysv ipc ]
> @@ -391,6 +392,21 @@ This flag controls the L2 cache of G3 processor boards. If
>
> ==============================================================
>
> +modify_ldt: (X86 only)
> +
> +Enables (1), disables (0) or permanently disables (-1) the modify_ldt syscall.
> +Modifying the LDT (Local Descriptor Table) may be needed to run a 16-bit or
> +segmented code such as Dosemu or Wine. This is done via a system call which is
> +not needed to run portable applications, and which can sometimes be abused to
> +exploit some weaknesses of the architecture, opening new vulnerabilities.
> +

I'm not entirely convinced that the lock bit should work this way. At
some point, we might want a setting for "32-bit only" or even "32-bit,
present, not non-conforming only" (like we do unconditionally for
set_thread_area). When we do that, having -1 act like 0 might be
confusing.

I'd actually favor rigging it up to support enumerated values and/or
the word "locked" somewhere in the text. So we could have "0", "1
locked", "1" or even "enabled" "enabled locked", "disabled", "disabled
locked", "safe 32-bit", "safe 32-bit locked", etc.

I'll add an explicit 16-bit check to my infinite todo list for the asm
part. Now that the synchronous modify_ldt code is merged, it won't be
racy, and it would make a 32-bit only mode actually be useful (except
maybe on AMD -- someone needs to test just how badly broken IRET is on
AMD systems -- I know that AMD has IRET-to-16-bit differently broken
from Intel, and that causes test-cast failures. Grump.)

P.S. Hey CPU vendors: please consider stopping your utter suckage when
it comes to critical system instructions. Intel and AMD both
terminally screwed up IRET in multiple ways that clearly took actual
effort. Intel screwed up SYSRET pretty badly (AFAIK every single
64-bit OS has had at least one root hole as a result), and AMD screwed
SYSRET up differently (userspace crash bug that requires a performance
hit to mitigate because no one at AMD realized that one might preempt
a process during a syscall).

P.P.S. You know what would be *way* better than allowing IRET to
fault? Just allow IRET to continue executing the next instruction on
failure (I'm talking about #GP, #NP, and #SS here, not page faults).

P.P.P.S. Who thought that IRET faults unmasking NMIs made any sense
whatsoever when NMIs run on an IST stack? Seriously, people?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/