Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault

From: Linus Torvalds
Date: Mon Feb 18 2019 - 12:59:10 EST


[ Sorry for not looking at this earlier, I have family in town so I
was mostly busy over the weekend... ]

On Fri, Feb 15, 2019 at 4:19 PM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> Would changing all the mention of "kernel address" to "non user space"
> be accurate?

That would have worked, but in the meantime I just decided to pull the
existing tag, because it's not _horribly_ misleading any more and now
we're just talking details of what is a "kernel address" etc. And the
patch itself is better than what we used to have.

That said, I do agree with Andy that both the old
_copy_from_user_inatomic() thing and the new probe_mem_read() are just
fundamentally broken, nasty and slow. It just so happens that
probe_mem_read() works _reasonably_ well in practice on x86.

Basically, probe_mem_read() -> probe_kernel_read() really only works
on true kernel addresses. And some of that is very fundamental: some
architectures can have two entirely different address spaces for user
and kernel memory, so if you give _any_ function an "try to read this
address", it fundamentally has to be one or the other.

The fact that on x86, there is a unified address space for
kernel/user, and it can work for one or the other, is just
happenstance (and admittedly the common case).

So I've pulled the existing pull request, because it papers over one
particular issue, but the real fix would require:

- knowing whether it's kernel or user space you access

- actually using that knowledge to then limit the addresses we are
willing to probe, and _how_ we probe them.

The user-space case is fairly easy: just check the address with
"access_ok()", and then use _copy_user_atomic() without any set_fs()
games. That should "JustWork(tm)".

And if it's truly supposed to be a kernel address, then we probably
need to add a "is this possibly a valid kernel data pointer"
interface. Before we then do what "probe_kernel_read()" currently
does.

Linus