Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode

From: Song Liu
Date: Wed Feb 05 2025 - 12:59:49 EST


On Wed, Feb 5, 2025 at 6:43 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
>
> On Tue, Feb 4, 2025 at 5:53 AM Song Liu <song@xxxxxxxxxx> wrote:
> >
> > On Mon, Feb 3, 2025 at 1:45 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> > [...]
> > >
> > > If you’re managing a large fleet of servers, this issue is far from negligible.
> > >
> > > >
> > > > > Can you provide examples of companies that use atomic replacement at
> > > > > scale in their production environments?
> > > >
> > > > At least SUSE uses it as a solution for its customers. No many problems
> > > > have been reported since we started ~10 years ago.
> >
> > We (Meta) always use atomic replacement for our live patches.
> >
> > >
> > > Perhaps we’re running different workloads.
> > > Going back to the original purpose of livepatching: is it designed to address
> > > security vulnerabilities, or to deploy new features?
> > > If it’s the latter, then there’s definitely a lot of room for improvement.
> >
> > We only use KLP to fix bugs and security vulnerabilities. We do not use
> > live patches to deploy new features.
>
> +BPF
>
> Hello Song,
>
> Since bpf_fexit also uses trampolines, I was curious about what would
> happen if I attached do_exit() to fexit. Unfortunately, it triggers a
> bug in BPF as well. The BPF program is as follows:
>
> SEC("fexit/do_exit")
> int fexit_do_exit
> {
> return 0;
> }
>
> After the fexit program exits, the trampoline is still left over:
>
> $ bpftool link show <<<< nothing output
> $ grep "bpf_trampoline_[0-9]" /proc/kallsyms
> ffffffffc04cb000 t bpf_trampoline_6442526459 [bpf]

I think we should first understand why the trampoline is not
freed.

> We could either add functions annotated as "__noreturn" to the deny
> list for fexit as follows, or we could explore a more generic
> solution, such as embedding the "noreturn" information into the BTF
> and extracting it when attaching fexit.

I personally don't think this is really necessary. It is good to have.
But a reasonable user should not expect noreturn function to
generate fexit events.

Thanks,
Song

> Any thoughts?
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index d77abb87ffb1..37192888473c 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -22742,6 +22742,13 @@ BTF_ID(func, __rcu_read_unlock)
> #endif
> BTF_SET_END(btf_id_deny)
>
> +BTF_SET_START(fexit_deny)
> +BTF_ID_UNUSED
> +/* do_exit never returns */
> +/* TODO: Add other functions annotated with __noreturn or figure out
> a generic solution */
> +BTF_ID(func, do_exit)
> +BTF_SET_END(fexit_deny)
> +
> static bool can_be_sleepable(struct bpf_prog *prog)
> {
> if (prog->type == BPF_PROG_TYPE_TRACING) {
> @@ -22830,6 +22837,9 @@ static int check_attach_btf_id(struct
> bpf_verifier_env *env)
> } else if (prog->type == BPF_PROG_TYPE_TRACING &&
> btf_id_set_contains(&btf_id_deny, btf_id)) {
> return -EINVAL;
> + } else if (prog->expected_attach_type == BPF_TRACE_FEXIT &&
> + btf_id_set_contains(&fexit_deny, btf_id)) {
> + return -EINVAL;
> }
>
> key = bpf_trampoline_compute_key(tgt_prog,
> prog->aux->attach_btf, btf_id);