Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
From: Yafang Shao
Date: Wed Feb 05 2025 - 21:55:39 EST
On Thu, Feb 6, 2025 at 1:59 AM Song Liu <song@xxxxxxxxxx> wrote:
>
> On Wed, Feb 5, 2025 at 6:43 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> >
> > On Tue, Feb 4, 2025 at 5:53 AM Song Liu <song@xxxxxxxxxx> wrote:
> > >
> > > On Mon, Feb 3, 2025 at 1:45 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> > > [...]
> > > >
> > > > If you’re managing a large fleet of servers, this issue is far from negligible.
> > > >
> > > > >
> > > > > > Can you provide examples of companies that use atomic replacement at
> > > > > > scale in their production environments?
> > > > >
> > > > > At least SUSE uses it as a solution for its customers. No many problems
> > > > > have been reported since we started ~10 years ago.
> > >
> > > We (Meta) always use atomic replacement for our live patches.
> > >
> > > >
> > > > Perhaps we’re running different workloads.
> > > > Going back to the original purpose of livepatching: is it designed to address
> > > > security vulnerabilities, or to deploy new features?
> > > > If it’s the latter, then there’s definitely a lot of room for improvement.
> > >
> > > We only use KLP to fix bugs and security vulnerabilities. We do not use
> > > live patches to deploy new features.
> >
> > +BPF
> >
> > Hello Song,
> >
> > Since bpf_fexit also uses trampolines, I was curious about what would
> > happen if I attached do_exit() to fexit. Unfortunately, it triggers a
> > bug in BPF as well. The BPF program is as follows:
> >
> > SEC("fexit/do_exit")
> > int fexit_do_exit
> > {
> > return 0;
> > }
> >
> > After the fexit program exits, the trampoline is still left over:
> >
> > $ bpftool link show <<<< nothing output
> > $ grep "bpf_trampoline_[0-9]" /proc/kallsyms
> > ffffffffc04cb000 t bpf_trampoline_6442526459 [bpf]
>
> I think we should first understand why the trampoline is not
> freed.
IIUC, the fexit works as follows,
bpf_trampoline
+ __bpf_tramp_enter
+ percpu_ref_get(&tr->pcref);
+ call do_exit()
+ __bpf_tramp_exit
+ percpu_ref_put(&tr->pcref);
Since do_exit() never returns, the refcnt of the trampoline image is
never decremented, preventing it from being freed.
>
> > We could either add functions annotated as "__noreturn" to the deny
> > list for fexit as follows, or we could explore a more generic
> > solution, such as embedding the "noreturn" information into the BTF
> > and extracting it when attaching fexit.
>
> I personally don't think this is really necessary. It is good to have.
> But a reasonable user should not expect noreturn function to
> generate fexit events.
If we don't plan to fix it, we should clearly document it to guide
users and advise them against using it.
--
Regards
Yafang