Re: [PATCH 1/1] bpf: Drop unprotected find_vpid() in favour of find_get_pid()

From: Yonghong Song
Date: Thu Jul 21 2022 - 11:53:45 EST




On 7/21/22 5:14 AM, Jiri Olsa wrote:
On Thu, Jul 21, 2022 at 12:59:09PM +0100, Lee Jones wrote:
On Thu, 21 Jul 2022, Jiri Olsa wrote:

On Thu, Jul 21, 2022 at 12:14:30PM +0100, Lee Jones wrote:
The documentation for find_pid() clearly states:

typo find_vpid


"Must be called with the tasklist_lock or rcu_read_lock() held."

Presently we do neither.

just curious, did you see crash related to this or you just spot that


In an ideal world we would wrap the in-lined call to find_vpid() along
with get_pid_task() in the suggested rcu_read_lock() and have done.
However, looking at get_pid_task()'s internals, it already does that
independently, so this would lead to deadlock.

hm, we can have nested rcu_read_lock calls, right?

I assumed not, but that might be an oversight on my part.

From kernel documentation, nested rcu_read_lock is allowed.
https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html

RCU's grace-period guarantee allows updaters to wait for the completion of all pre-existing RCU read-side critical sections. An RCU read-side critical section begins with the marker rcu_read_lock() and ends with the marker rcu_read_unlock(). These markers may be nested, and RCU treats a nested set as one big RCU read-side critical section. Production-quality implementations of rcu_read_lock() and rcu_read_unlock() are extremely lightweight, and in fact have exactly zero overhead in Linux kernels built for production use with CONFIG_PREEMPT=n.


Would that be your preference?

seems simpler than calling get/put for ppid

The current implementation seems okay since we can hide
rcu_read_lock() inside find_get_pid(). We can also avoid
nested rcu_read_lock(), which is although allowed but
not pretty.


jirka


Instead, we'll use find_get_pid() which searches for the vpid, then
takes a reference to it preventing early free, all within the safety
of rcu_read_lock(). Once we have our reference we can safely make use
of it up until the point it is put.

Cc: Alexei Starovoitov <ast@xxxxxxxxxx>
Cc: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
Cc: John Fastabend <john.fastabend@xxxxxxxxx>
Cc: Andrii Nakryiko <andrii@xxxxxxxxxx>
Cc: Martin KaFai Lau <martin.lau@xxxxxxxxx>
Cc: Song Liu <song@xxxxxxxxxx>
Cc: Yonghong Song <yhs@xxxxxx>
Cc: KP Singh <kpsingh@xxxxxxxxxx>
Cc: Stanislav Fomichev <sdf@xxxxxxxxxx>
Cc: Hao Luo <haoluo@xxxxxxxxxx>
Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
Cc: bpf@xxxxxxxxxxxxxxx
Fixes: 41bdc4b40ed6f ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY")
Signed-off-by: Lee Jones <lee@xxxxxxxxxx>
---
kernel/bpf/syscall.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 83c7136c5788d..c20cff30581c4 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -4385,6 +4385,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
const struct perf_event *event;
struct task_struct *task;
struct file *file;
+ struct pid *ppid;
int err;
if (CHECK_ATTR(BPF_TASK_FD_QUERY))
@@ -4396,7 +4397,9 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
if (attr->task_fd_query.flags != 0)
return -EINVAL;
- task = get_pid_task(find_vpid(pid), PIDTYPE_PID);
+ ppid = find_get_pid(pid);
+ task = get_pid_task(ppid, PIDTYPE_PID);
+ put_pid(ppid);
if (!task)
return -ENOENT;

--
Lee Jones [李琼斯]