Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault
From: Jann Horn
Date: Fri Feb 22 2019 - 16:44:24 EST
(adding some people from the text_poke series to the thread, removing stable@)
On Fri, Feb 22, 2019 at 8:55 PM Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> > On Feb 22, 2019, at 11:34 AM, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote:
> >> On Fri, Feb 22, 2019 at 02:30:26PM -0500, Steven Rostedt wrote:
> >> On Fri, 22 Feb 2019 11:27:05 -0800
> >> Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote:
> >>
> >>>> On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote:
> >>>>
> >>>> Then we should still probably fix up "__probe_kernel_read()" to not
> >>>> allow user accesses. The easiest way to do that is actually likely to
> >>>> use the "unsafe_get_user()" functions *without* doing a
> >>>> uaccess_begin(), which will mean that modern CPU's will simply fault
> >>>> on a kernel access to user space.
> >>>
> >>> On bpf side the bpf_probe_read() helper just calls probe_kernel_read()
> >>> and users pass both user and kernel addresses into it and expect
> >>> that the helper will actually try to read from that address.
> >>>
> >>> If __probe_kernel_read will suddenly start failing on all user addresses
> >>> it will break the expectations.
> >>> How do we solve it in bpf_probe_read?
> >>> Call probe_kernel_read and if that fails call unsafe_get_user byte-by-byte
> >>> in the loop?
> >>> That's doable, but people already complain that bpf_probe_read() is slow
> >>> and shows up in their perf report.
> >>
> >> We're changing kprobes to add a specific flag to say that we want to
> >> differentiate between kernel or user reads. Can this be done with
> >> bpf_probe_read()? If it's showing up in perf report, I doubt a single
> >
> > so you're saying you will break existing kprobe scripts?
> > I don't think it's a good idea.
> > It's not acceptable to break bpf_probe_read uapi.
> >
>
> If so, the uapi is wrong: a long-sized number does not reliably identify an address if you donât separately know whether itâs a user or kernel address. s390x and 4G:4G x86_32 are the notable exceptions. I have lobbied for RISC-V and future x86_64 to join the crowd. I donât know whether Iâll win this fight, but the uapi will probably have to change for at least s390x.
>
> What to do about existing scripts is a different question.
This lack of logical separation between user and kernel addresses
might interact interestingly with the text_poke series, specifically
"[PATCH v3 05/20] x86/alternative: Initialize temporary mm for
patching" (https://lore.kernel.org/lkml/20190221234451.17632-6-rick.p.edgecombe@xxxxxxxxx/)
and "[PATCH v3 06/20] x86/alternative: Use temporary mm for text
poking" (https://lore.kernel.org/lkml/20190221234451.17632-7-rick.p.edgecombe@xxxxxxxxx/),
right? If someone manages to get a tracing BPF program to trigger in a
task that has switched to the patching mm, could they use
bpf_probe_write_user() - which uses probe_kernel_write() after
checking that KERNEL_DS isn't active and that access_ok() passes - to
overwrite kernel text that is mapped writable in the patching mm?