Re: [PATCH bpf-next 1/2] bpf: Introduce bpf_probe_write_user_registered()
From: Alexei Starovoitov
Date: Thu Apr 04 2024 - 19:24:09 EST
On Thu, Apr 4, 2024 at 12:02 PM Marco Elver <elver@xxxxxxxxxx> wrote:
>
> With all the known caveats, tracing BPF programs may directly write to
> user-space memory with the bpf_probe_write_user() helper. Memory safety
> is an obvious problem when using this helper, since it is too easy to
> overwrite memory across all running processes that user space did not
> expect to be touched (neither the verifier nor the kernel knows what may
> be touched). While it is possible to come up with mechanisms to safely
> communicate to the BPF program which memory region may be written to,
> there are no built-in guarantees of safety. For this reason, the helper
> produces a warning in the kernel log, and in newer kernels it is
> possible to disallow use of the helper since 51e1bb9eeaf7 ("bpf: Add
> lockdown check for probe_write_user helper").
So is it a fix or a feature?
> Nevertheless, direct user-space memory writes from BPF programs can be
> useful to efficiently manipulate and communicate with cooperating user
> space processes.
But there are many different ways for bpf to communicate with user space:
perf ringbuf, bpf ringbug, various maps including mmap-ed array and arena.
The commit log doesn't explain why we need another one.
> For example, one of our use cases are for events that happen relatively
> frequently in the kernel (e.g. specific scheduler events), but a set of
> user space threads want to check for such events in very hot code paths
> to make more optimal decisions (the cost of such a check can be no more
> than a load and compare). The types of events and heuristics used may
> change based on system environment and application, and a BPF program
> provides the best trade-offs in terms of performance and deployment.
and the tasks can use mmaped array shared across all or unique to each
process.
And both bpf and user space can read/write them with a single instruction.
>
> To achieve better safety, introduce tagged user writable regions, that
> must explicitly be registered before tracing BPF programs may use them:
>
> 1. The prctl() option PR_BPF_REGISTER_WRITABLE allows any user space
> process (that is allowed to use prctl()) to register tagged writable
> memory regions for the current thread.
>
> 2. Conversely, the prctl() option PR_BPF_UNREGISTER_WRITABLE allows a
> user space process to unregister a writable memory region that was
> previously registered from the current thread. This must be done
> before freeing memory if the thread that registered a region is
> still running.
>
> 3. Tracing BPF programs may write to any registered region in the
> current thread with bpf_probe_write_user_registered(). If the memory
> region has been tagged with a non-zero value, the BPF program must
> provide a matching tag.
>
> Admin capabilities are still required to attach BPF programs that use
> the new bpf_probe_write_user_registered() helper.
We stopped adding new helpers ~2 years ago.
Only new kfuncs are allowed.
>
> With this interface, user space threads are guaranteed that no writes
> happen to regions that they did not explicitly register. Tagging can be
> used to associate additional semantics with the memory region.
>
> A note on tag allocation: Since such programs can only be installed by
> the local system administrator, tag allocation may be done by the system
> administrator. For example, by providing headers with tag definitions,
> or a central service to distribute tags to the BPF program loader and to
> user applications.
Not clear how that's achieved in practice.
To do prctl(REGSISTER, ... tag)
the process will just pass this u32 tag.
There is no way for the admin or other process to enforce certain
tag usage.
Since there is no way to enforce extra tag seems like a weak
protection against something? What in particular?