Re: [RFC PATCH v2 2/6] x86/arch_prctl: add ARCH_GET_NOPTI and ARCH_SET_NOPTI to enable/disable PTI

From: Willy Tarreau
Date: Tue Jan 09 2018 - 09:43:53 EST


Hi Boris!

On Tue, Jan 09, 2018 at 03:17:13PM +0100, Borislav Petkov wrote:
> On Tue, Jan 09, 2018 at 01:56:16PM +0100, Willy Tarreau wrote:
> > This allows to report the current state of the PTI protection and to
> > enable or disable it for the current process. The state change is only
> > allowed if the mm is not shared (no threads have been created yet).
> >
> > Setting the flag to disable the protection is subject to CAP_SYS_RAWIO.
> > However it is possible to re-enable the protection without this privilege.
> >
> > Signed-off-by: Willy Tarreau <w@xxxxxx>
> > Cc: Andy Lutomirski <luto@xxxxxxxxxx>
> > Cc: Borislav Petkov <bp@xxxxxxxxx>
> > Cc: Brian Gerst <brgerst@xxxxxxxxx>
> > Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> > Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> > Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > Cc: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
> > Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
> > Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> >
> > v2:
> > - use {set,clear}_thread_flag() as recommended by Peter
> > - use task->mm->context.pti_disable instead of task flag
> > - check for mm_users == 1
> > - check for CAP_SYS_RAWIO only when setting, not clearing
> > - make the code depend on CONFIG_PAGE_TABLE_ISOLATION
> > ---
> > arch/x86/include/uapi/asm/prctl.h | 3 +++
> > arch/x86/kernel/process_64.c | 15 +++++++++++++++
> > 2 files changed, 18 insertions(+)
> >
> > diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
> > index 5a6aac9..1f1b5bc 100644
> > --- a/arch/x86/include/uapi/asm/prctl.h
> > +++ b/arch/x86/include/uapi/asm/prctl.h
> > @@ -10,6 +10,9 @@
> > #define ARCH_GET_CPUID 0x1011
> > #define ARCH_SET_CPUID 0x1012
> >
> > +#define ARCH_GET_NOPTI 0x1021
> > +#define ARCH_SET_NOPTI 0x1022
> > +
> > #define ARCH_MAP_VDSO_X32 0x2001
> > #define ARCH_MAP_VDSO_32 0x2002
> > #define ARCH_MAP_VDSO_64 0x2003
> > diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> > index c754662..9516310 100644
> > --- a/arch/x86/kernel/process_64.c
> > +++ b/arch/x86/kernel/process_64.c
> > @@ -654,7 +654,22 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
> > ret = put_user(base, (unsigned long __user *)arg2);
> > break;
> > }
> > +#ifdef CONFIG_PAGE_TABLE_ISOLATION
>
> Actually, I meant to add a new CONFIG item only for this feature which
> depends on CONFIG_PAGE_TABLE_ISOLATION. So that people can disable the
> per-mm thing when they don't want it.

I see and am not particularly against this, but what use case do you
have in mind precisely ? I doubt it's just saving a few tens of bytes,
so probably you're more concerned about the potential risks this opens ?
But given we only allow this for CAP_SYS_RAWIO and these ones already
have access to /dev/mem and many other things, don't you think there
are much easier ways to dump kernel memory in this case than trying to
inject some meltdown code into the victim process ? Or maybe you have
other cases in mind that I'm not seeing.

Thanks,
willy