Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault

From: Andy Lutomirski
Date: Fri Feb 15 2019 - 18:52:01 EST




> On Feb 15, 2019, at 2:15 PM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> On Fri, 15 Feb 2019 09:55:32 -0800
> Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
>>> On Fri, Feb 15, 2019 at 9:49 AM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>>>
>>> From: Changbin Du <changbin.du@xxxxxxxxx>
>>>
>>> The userspace can ask kprobe to intercept strings at any memory address,
>>> including invalid kernel address.
>>
>> Again, this is not about a "kernel address" at all.
>>
>> It's neither a kernel address _nor_ a user address. It's an invalid
>> address entirely, and there is nothing that makes it "kernel".
>>
>> Please don't talk about "invalid kernel addresses" when it is no such thing.
>>
>
> Ah, I see what you are saying. The example of the bug that the patch
> fixed used a non-canonical address, which is "garbage", and not kernel
> or user space. Point taken.
>
> But the issue is deeper than that. This code never bugged until the
> following commit was added:
>
> 9da3f2b7405 "x86/fault: BUG() when uaccess helpers fault on kernel addresses"
>
> Before that commit, this worked fine.
>
> In fact, we can trigger the same bug (with a slightly different
> message), if we use a kernel space address.
>
> To test this, I added the following variable somewhere in the code:
>
> void *sdr_ptr = 0xffffffff70112200;
>
> And then did the following:
>
> # echo 'p:fault do_sys_open arg=+0(@sdr_ptr):string' > /debug/tracing/kprobe_events
>
> Which will read the sdr_ptr variable just like the example did with +0(+0(%si)):string.
> But this time I control the what address it is triggered on.
>
> And it crashed with:
>
> [ 1447.876799] BUG: pagefault on kernel address 0xffffffff70112200 in non-whitelisted uaccess
>
> The above message in the bug in the patch was:
> "BUG: GPF in non-whitelisted uaccess (non-canonical address?)"
>
> [ 1447.885053] BUG: unable to handle kernel paging request at ffffffff70112200
> [ 1447.891997] #PF error: [normal kernel read fault]
> [ 1447.896695] PGD 8a212067 P4D 8a212067 PUD 0
> [ 1447.900679] BUG: pagefault on kernel address 0xffffffff70112200 in non-whitelisted uaccess
> [ 1447.900967] Oops: 0000 [#1] PREEMPT SMP PTI
> [ 1447.913394] CPU: 7 PID: 1688 Comm: cat Not tainted 5.0.0-rc5-test+ #28
> [ 1447.919905] Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
> [ 1447.928843] RIP: 0010:process_fetch_insn+0x1a0/0x470
> [ 1447.933804] Code: ff f0 80 48 03 80 83 80 18 1a 00 00 01 31 c9 eb 10 48 83 c2 01 85 c0 75 1f 81 f9 ff 0f 00 00 7f 17 0f 1f 00 0f ae e8 44 89 e0 <40> 8a 32 0f 1f 00 83 c1 01 40 84 f6 75 d9 65 48 8b 14 25 00 5c 01
> [ 1447.952520] RSP: 0018:ffffb77b80673d08 EFLAGS: 00010246
> [ 1447.957736] RAX: 0000000000000000 RBX: ffff91bfb7ab2c30 RCX: 0000000000000000
> [ 1447.964857] RDX: ffffffff70112200 RSI: ffffffff97267a80 RDI: 00007ffffffff000
> [ 1447.971981] RBP: 0000000000000000 R08: ffffffff70112200 R09: ffff91c014518000
> [ 1447.979105] R10: ffff91c01451801c R11: ffff91c0185e5800 R12: 0000000000000000
> [ 1447.986226] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 1447.993342] FS: 0000000000000000(0000) GS:ffff91c01a9c0000(0000) knlGS:0000000000000000
> [ 1448.001417] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1448.007157] CR2: ffffffff70112200 CR3: 00000000b3f32004 CR4: 00000000001606e0
> [ 1448.014280] Call Trace:
> [ 1448.016737] kprobe_trace_func+0x278/0x360
> [ 1448.020834] ? preempt_count_sub+0x98/0xe0
> [ 1448.024931] ? do_sys_open+0x5/0x220
> [ 1448.028503] kprobe_dispatcher+0x36/0x50
> [ 1448.032426] ? do_sys_open+0x1/0x220
> [ 1448.036002] kprobe_ftrace_handler+0xb5/0x120
> [ 1448.040356] ftrace_ops_assist_func+0x87/0x110
> [ 1448.044797] 0xffffffffc06a30bf
> [ 1448.047939] ? __ia32_sys_open+0x20/0x20
> [ 1448.051860] ? do_syscall_64+0x1c/0x160
> [ 1448.055694] ? do_sys_open+0x1/0x220
> [ 1448.059268] do_sys_open+0x5/0x220
> [ 1448.062672] do_syscall_64+0x60/0x160
> [ 1448.066335] entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> Which triggered on the address: 0xffffffff70112200
>
> Which is perfectly canonical.
>
> The reason I use "kernel address" is because this became an issue after
> "x86/fault: BUG() when uaccess helpers fault on kernel addresses" was
> added and that explicitly states "kernel address".
>
> That patch added:
>
> + /*
> + * This is a faulting memory access in kernel space, on a kernel
> + * address, in a usercopy function. This can e.g. be caused by improper
> + * use of helpers like __put_user and by improper attempts to access
> + * userspace addresses in KERNEL_DS regions.
> + * The one (semi-)legitimate exception are probe_kernel_{read,write}(),
> + * which can be invoked from places like kgdb, /dev/mem (for reading)
> + * and privileged BPF code (for reading).
> + * The probe_kernel_*() functions set the kernel_uaccess_faults_ok flag
> + * to tell us that faulting on kernel addresses, and even noncanonical
> + * addresses, in a userspace accessor does not necessarily imply a
> + * kernel bug, root might just be doing weird stuff.
> + */
> + if (current->kernel_uaccess_faults_ok)
> + return false;
> +
> + /* This is bad. Refuse the fixup so that we go into die(). */
> + if (trapnr == X86_TRAP_PF) {
> + pr_emerg("BUG: pagefault on kernel address 0x%lx in non-whitelisted uaccess\n",
> + fault_addr);
> + } else {
> + pr_emerg("BUG: GPF in non-whitelisted uaccess (non-canonical address?)\n");
> + }
>
> Where this path triggered by the kprobe using copy_from_user(). So
> yeah, it can happen if it triggered on a non-canonical address (which is
> just garbage), but it can also trigger if it's a kernel address that
> isn't mapped either.
>
> So the comment in the code is not 100% accurate, because it isn't just
> a kernel address, but also a non-canonical one. Something tells me that
> the change log of the commit that checks this isn't written well
> either. What exactly can't be done now with uaccess code? I'm guessing
> that it should only be allowed to fault on user space addresses? So
> should I change this commit log to:
>
> kprobe: Do not use uaccess function to access non-user address that can fault
>
> And change all the "kernel address" mentions to "non-user address" as
> non-user covers both kernel address and non-canonical?
>
>
> -- Steve
>
> This patch allows me to modify the sdr_ptr as well from the tracing directory:
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 2cf3c747a357..292fe2ef0a45 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -7928,6 +7928,45 @@ static const struct file_operations buffer_percent_fops = {
> .llseek = default_llseek,
> };
>
> +void *sdr_ptr = 0xffffffff70112200;
> +
> +static ssize_t
> +sdr_ptr_read(struct file *filp, char __user *ubuf,
> + size_t cnt, loff_t *ppos)
> +{
> + char buf[64];
> + int r;
> +
> + r = sprintf(buf, "%px\n", sdr_ptr);
> +
> + return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
> +}
> +
> +static ssize_t
> +sdr_ptr_write(struct file *filp, const char __user *ubuf,
> + size_t cnt, loff_t *ppos)
> +{
> + unsigned long val;
> + int ret;
> +
> + ret = kstrtoul_from_user(ubuf, cnt, 0, &val);
> + if (ret)
> + return ret;
> +
> + sdr_ptr = val;
> +
> + (*ppos)++;
> +
> + return cnt;
> +}
> +
> +static const struct file_operations sdr_ptr_fops = {
> + .open = tracing_open_generic,
> + .read = sdr_ptr_read,
> + .write = sdr_ptr_write,
> + .llseek = default_llseek,
> +};
> +
> struct dentry *trace_instance_dir;
>
> static void
> @@ -8188,6 +8227,9 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer)
> struct trace_event_file *file;
> int cpu;
>
> + trace_create_file("sdr_ptr", 0644, d_tracer,
> + NULL, &sdr_ptr_fops);
> +
> trace_create_file("available_tracers", 0444, d_tracer,
> tr, &show_traces_fops);
>


Iâm missing most of the context here, but even probe_kernel_...() is unwise for a totally untrustworthy address. It could be MMIO, for example.

If needed, we could come up with a safe-ish helper for tracing. For direct-map addresses, probe_kernel_...() is probably okay. Same for the current stack. Otherwise we could walk the page tables and check that the address is cacheable, I suppose, although this is slightly dubious if we donât also check MTRRs. We could also check that the PA is in main memory, I suppose, although this may have unfortunate interactions with the MCE code.