Re: [RFC PATCH 0/4] trace, livepatch: Allow kprobe return overriding for livepatched functions

From: Google

Date: Tue Apr 14 2026 - 20:48:50 EST


On Sun, 12 Apr 2026 21:50:31 +0800
Yafang Shao <laoar.shao@xxxxxxxxx> wrote:

> On Fri, Apr 10, 2026 at 12:38 PM Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
> >
> > Hi Yafang,
> >
> > On Thu, 2 Apr 2026 17:26:03 +0800
> > Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> >
> > > Livepatching allows for rapid experimentation with new kernel features
> > > without interrupting production workloads. However, static livepatches lack
> > > the flexibility required to tune features based on task-specific attributes,
> > > such as cgroup membership, which is critical in multi-tenant k8s
> > > environments. Furthermore, hardcoding logic into a livepatch prevents
> > > dynamic adjustments based on the runtime environment.
> > >
> > > To address this, we propose a hybrid approach using BPF. Our production use
> > > case involves:
> > >
> > > 1. Deploying a Livepatch function to serve as a stable BPF hook.
> > >
> > > 2. Utilizing bpf_override_return() to dynamically modify the return value
> > > of that hook based on the current task's context.
> >
> > First of all, I don't like this approach to test a new feature in the
> > kernel, because it sounds like allowing multiple different generations
> > of implementations to coexist simultaneously. The standard kernel code
> > is not designed to withstand such implementations.
>
> However, this approach is invaluable for rapidly deploying new kernel
> features to production servers without downtime. Upgrading kernels
> across a large fleet remains a significant challenge.

I think that downtime should be accepted as a cost for stability in
general. If your new kernel feature has a bug and causes a crash,
anyway it gets your servers down.

> >
> > For example, if you implement a well-designed framework in a specific
> > subsystem, like Schedext, which allows multiple implementations extended
> > with BPF to coexist, there's no problem (at least it's debatable).
> >
> > But if it is for any function, it is dangerous feature. Bugs that occur
> > in kernels that use this functionality cannot be addressed here. They
> > need to be treated the same way as out-of-tree drivers or forked kernels.
> > I mean, add a tainted flag for this feature. And we don't care of it.
>
> Agreed. This should be handled as an OOT module rather than part of
> the core kernel.
>
> >
> > >
> > > A significant challenge arises when atomic-replace is enabled. In this
> > > mode, deploying a new livepatch changes the target function's address,
> > > forcing a re-attachment of the BPF program. This re-attachment latency is
> > > unacceptable in critical paths, such as those handling networking policies.
> > >
> > > To solve this, we introduce a hybrid livepatch mode that allows specific
> > > patches to remain non-replaceable, ensuring the function address remains
> > > stable and the BPF program stays attached.
> >
> > Can you share your actual problem to be solved?
>
> Here is an example we recently deployed on our production servers:
>
> https://lore.kernel.org/bpf/CALOAHbDnNba_w_nWH3-S9GAXw0+VKuLTh1gy5hy9Yqgeo4C0iA@xxxxxxxxxxxxxx/
>
> In one of our specific clusters, we needed to send BGP traffic out
> through specific NICs based on the destination IP. To achieve this
> without interrupting service, we live-patched
> bond_xmit_3ad_xor_slave_get(), added a new hook called
> bond_get_slave_hook(), and then ran a BPF program attached to that
> hook to select the outgoing NIC from the SKB. This allowed us to
> rapidly deploy the feature with zero downtime.

In this case, you can make specific livepatch or kernel module
to replace the kernel function without using BPF on your server.

The BGP trafic in bonding device seems very specific, so it may not
cause a trouble, but this is very generic change, which allows
user to change more core kernel feature, e.g. memory management
or scheduler etc.

Excessive degrees of freedom introduce uncertainty and instability
into a system. While the functionality is interesting, it would be
a way to generalize schedext in an uncontrolled way.

At a minimum, some form of build time and runtime constraint, along
with a taint flag that clearly indicates in the crash logs that this
feature is being used, would be necessary. (It means this is should
not be used in production environment.)

Thank you,
>
> [...]
>
> --
> Regards
> Yafang


--
Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>