[PATCH bpf-next 00/18] Introduce bpf_wq

From: Benjamin Tissoires
Date: Tue Apr 16 2024 - 10:10:25 EST


This is a followup of sleepable bpf_timer[0].

When discussing sleepable bpf_timer, it was thought that we should give
a try to bpf_wq, as the 2 APIs are similar but distinct enough to
justify a new one.

So here it is.

I tried to keep as much as possible common code in kernel/bpf/helpers.c
but I couldn't get away with code duplication in kernel/bpf/verifier.c.

This series introduces a basic bpf_wq support:
- creation is supported
- assignment is supported
- running a simple bpf_wq is also supported.

We will probably need to extend the API further with:
- a full delayed_work API (can be piggy backed on top with a correct
flag)
- bpf_wq_cancel()
- bpf_wq_cancel_sync() (for sleepable programs)
- documentation

But for now, let's focus on what we currently have to see if it's worth
it compared to sleepable bpf_timer.

FWIW, I still have a couple of concerns with this implementation:
- I'm explicitely declaring the async callback as sleepable or not
(BPF_F_WQ_SLEEPABLE) through a flag. Is it really worth it?
Or should I just consider that any wq is running in a sleepable
context?
- bpf_wq_work() access ->prog without protection, but I think this might
be racing with bpf_wq_set_callback(): if we have the following:

CPU 0 CPU 1
bpf_wq_set_callback()
bpf_start()
bpf_wq_work():
prog = cb->prog;

bpf_wq_set_callback()
cb->prog = prog;
bpf_prog_put(prev)
rcu_assign_ptr(cb->callback_fn,
callback_fn);
callback = READ_ONCE(w->cb.callback_fn);

As I understand callback_fn is fine, prog might be, but we clearly
have an inconstency between "prog" and "callback_fn" as they can come
from 2 different bpf_wq_set_callback() calls.

IMO we should protect this by the async->lock, but I'm not sure if
it's OK or not.

---

For reference, the use cases I have in mind:

---

Basically, I need to be able to defer a HID-BPF program for the
following reasons (from the aforementioned patch):
1. defer an event:
Sometimes we receive an out of proximity event, but the device can not
be trusted enough, and we need to ensure that we won't receive another
one in the following n milliseconds. So we need to wait those n
milliseconds, and eventually re-inject that event in the stack.

2. inject new events in reaction to one given event:
We might want to transform one given event into several. This is the
case for macro keys where a single key press is supposed to send
a sequence of key presses. But this could also be used to patch a
faulty behavior, if a device forgets to send a release event.

3. communicate with the device in reaction to one event:
We might want to communicate back to the device after a given event.
For example a device might send us an event saying that it came back
from sleeping state and needs to be re-initialized.

Currently we can achieve that by keeping a userspace program around,
raise a bpf event, and let that userspace program inject the events and
commands.
However, we are just keeping that program alive as a daemon for just
scheduling commands. There is no logic in it, so it doesn't really justify
an actual userspace wakeup. So a kernel workqueue seems simpler to handle.

bpf_timers are currently running in a soft IRQ context, this patch
series implements a sleppable context for them.

Cheers,
Benjamin

To: Alexei Starovoitov <ast@xxxxxxxxxx>
To: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
To: Andrii Nakryiko <andrii@xxxxxxxxxx>
To: Martin KaFai Lau <martin.lau@xxxxxxxxx>
To: Eduard Zingerman <eddyz87@xxxxxxxxx>
To: Song Liu <song@xxxxxxxxxx>
To: Yonghong Song <yonghong.song@xxxxxxxxx>
To: John Fastabend <john.fastabend@xxxxxxxxx>
To: KP Singh <kpsingh@xxxxxxxxxx>
To: Stanislav Fomichev <sdf@xxxxxxxxxx>
To: Hao Luo <haoluo@xxxxxxxxxx>
To: Jiri Olsa <jolsa@xxxxxxxxxx>
To: Mykola Lysenko <mykolal@xxxxxx>
To: Shuah Khan <shuah@xxxxxxxxxx>
Cc: <bpf@xxxxxxxxxxxxxxx>
Cc: <linux-kernel@xxxxxxxxxxxxxxx>
Cc: <linux-kselftest@xxxxxxxxxxxxxxx>
Signed-off-by: Benjamin Tissoires <bentiss@xxxxxxxxxx>

[0] https://lore.kernel.org/all/20240408-hid-bpf-sleepable-v6-0-0499ddd91b94@xxxxxxxxxx/

---
Benjamin Tissoires (18):
bpf: trampoline: export __bpf_prog_enter/exit_recur
bpf: make timer data struct more generic
bpf: replace bpf_timer_init with a generic helper
bpf: replace bpf_timer_set_callback with a generic helper
bpf: replace bpf_timer_cancel_and_free with a generic helper
bpf: add support for bpf_wq user type
tools: sync include/uapi/linux/bpf.h
bpf: add support for KF_ARG_PTR_TO_WORKQUEUE
bpf: allow struct bpf_wq to be embedded in arraymaps and hashmaps
selftests/bpf: add bpf_wq tests
bpf: wq: add bpf_wq_init
tools: sync include/uapi/linux/bpf.h
selftests/bpf: wq: add bpf_wq_init() checks
bpf/verifier: add is_sleepable argument to push_callback_call
bpf: wq: add bpf_wq_set_callback_impl
selftests/bpf: add checks for bpf_wq_set_callback()
bpf: add bpf_wq_start
selftests/bpf: wq: add bpf_wq_start() checks

include/linux/bpf.h | 17 +-
include/linux/bpf_verifier.h | 1 +
include/uapi/linux/bpf.h | 13 +
kernel/bpf/arraymap.c | 18 +-
kernel/bpf/btf.c | 17 +
kernel/bpf/hashtab.c | 55 ++-
kernel/bpf/helpers.c | 371 ++++++++++++++++-----
kernel/bpf/syscall.c | 16 +-
kernel/bpf/trampoline.c | 6 +-
kernel/bpf/verifier.c | 195 ++++++++++-
tools/include/uapi/linux/bpf.h | 13 +
tools/testing/selftests/bpf/bpf_experimental.h | 7 +
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 5 +
.../selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h | 1 +
tools/testing/selftests/bpf/prog_tests/wq.c | 41 +++
tools/testing/selftests/bpf/progs/wq.c | 192 +++++++++++
tools/testing/selftests/bpf/progs/wq_failures.c | 197 +++++++++++
17 files changed, 1052 insertions(+), 113 deletions(-)
---
base-commit: ffa6b26b4d8a0520b78636ca9373ab842cb3b1a8
change-id: 20240411-bpf_wq-fe24e8d24f5e

Best regards,
--
Benjamin Tissoires <bentiss@xxxxxxxxxx>