Re: [PATCH v3 bpf-next 4/4] bpf: support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)
From: Song Liu
Date: Mon Jul 18 2022 - 12:55:56 EST
> On Jul 18, 2022, at 6:07 AM, Petr Mladek <pmladek@xxxxxxxx> wrote:
>
> On Sun 2022-07-17 17:14:05, Song Liu wrote:
>> When tracing a function with IPMODIFY ftrace_ops (livepatch), the bpf
>> trampoline must follow the instruction pointer saved on stack. This needs
>> extra handling for bpf trampolines with BPF_TRAMP_F_CALL_ORIG flag.
>>
>> Implement bpf_tramp_ftrace_ops_func and use it for the ftrace_ops used
>> by BPF trampoline. This enables tracing functions with livepatch.
>>
>> This also requires moving bpf trampoline to *_ftrace_direct_mult APIs.
>>
>> --- a/kernel/bpf/trampoline.c
>> +++ b/kernel/bpf/trampoline.c
>> @@ -13,6 +13,7 @@
>> #include <linux/static_call.h>
>> #include <linux/bpf_verifier.h>
>> #include <linux/bpf_lsm.h>
>> +#include <linux/delay.h>
>>
>> /* dummy _ops. The verifier will operate on target program's ops. */
>> const struct bpf_verifier_ops bpf_extension_verifier_ops = {
>> @@ -29,6 +30,81 @@ static struct hlist_head trampoline_table[TRAMPOLINE_TABLE_SIZE];
>> /* serializes access to trampoline_table */
>> static DEFINE_MUTEX(trampoline_mutex);
>>
>> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>> +static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mutex);
>> +
>> +static int bpf_tramp_ftrace_ops_func(struct ftrace_ops *ops, enum ftrace_ops_cmd cmd)
>> +{
>> + struct bpf_trampoline *tr = ops->private;
>> + int ret = 0;
>> +
>> + if (cmd == FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_SELF) {
>> + /* This is called inside register_ftrace_direct_multi(), so
>> + * tr->mutex is already locked.
>> + */
>> + WARN_ON_ONCE(!mutex_is_locked(&tr->mutex));
>
> Again, better is:
>
> lockdep_assert_held_once(&tr->mutex);
Will fix.
>
>> +
>> + /* Instead of updating the trampoline here, we propagate
>> + * -EAGAIN to register_ftrace_direct_multi(). Then we can
>> + * retry register_ftrace_direct_multi() after updating the
>> + * trampoline.
>> + */
>> + if ((tr->flags & BPF_TRAMP_F_CALL_ORIG) &&
>> + !(tr->flags & BPF_TRAMP_F_ORIG_STACK)) {
>> + if (WARN_ON_ONCE(tr->flags & BPF_TRAMP_F_SHARE_IPMODIFY))
>> + return -EBUSY;
>> +
>> + tr->flags |= BPF_TRAMP_F_SHARE_IPMODIFY;
>> + return -EAGAIN;
>> + }
>> +
>> + return 0;
>> + }
>> +
>> + /* The normal locking order is
>> + * tr->mutex => direct_mutex (ftrace.c) => ftrace_lock (ftrace.c)
>> + *
>> + * The following two commands are called from
>> + *
>> + * prepare_direct_functions_for_ipmodify
>> + * cleanup_direct_functions_after_ipmodify
>> + *
>> + * In both cases, direct_mutex is already locked. Use
>> + * mutex_trylock(&tr->mutex) to avoid deadlock in race condition
>> + * (something else is making changes to this same trampoline).
>> + */
>> + if (!mutex_trylock(&tr->mutex)) {
>> + /* sleep 1 ms to make sure whatever holding tr->mutex makes
>> + * some progress.
>> + */
>> + msleep(1);
>> + return -EAGAIN;
>> + }
>
> Huh, this looks horrible. And I do not get it. The above block prints
> a warning when the mutex is not taken. Why it is already taken
> when cmd == FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_SELF
> and why it has to be explicitly taken otherwise?
There are two different scenarios:
1. livepatch applied first, then bpf trampoline is registered.
In this case, we call ops_func(FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_SELF).
_SELF means we are making change to the DIRECT ops (bpf trampoline) itself.
In this case, tr->mutex is already taken.
2. bpf_trampoline registered first, then livepatch is applied.
In this case, ftrace cannot take tr->mutex first. Instead, ftrace has to
lock direct_mutex, find any conflict DIRECT ops, and call
ops_func(FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_PEER). _PEER means this is
called by a peer ftrace ops (livepatch).
>
> Would it be possible to call prepare_direct_functions_for_ipmodify(),
> cleanup_direct_functions_after_ipmodify() with rt->mutex already taken
> so that the ordering is correct even in this case.
>
> That said, this is the first version when I am in Cc. I am not sure
> if it has already been discussed.
>
>
>> + switch (cmd) {
>> + case FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_PEER:
>> + tr->flags |= BPF_TRAMP_F_SHARE_IPMODIFY;
>> +
>> + if ((tr->flags & BPF_TRAMP_F_CALL_ORIG) &&
>> + !(tr->flags & BPF_TRAMP_F_ORIG_STACK))
>> + ret = bpf_trampoline_update(tr, false /* lock_direct_mutex */);
>> + break;
>> + case FTRACE_OPS_CMD_DISABLE_SHARE_IPMODIFY_PEER:
>> + tr->flags &= ~BPF_TRAMP_F_SHARE_IPMODIFY;
>> +
>> + if (tr->flags & BPF_TRAMP_F_ORIG_STACK)
>> + ret = bpf_trampoline_update(tr, false /* lock_direct_mutex */);
>> + break;
>> + default:
>> + ret = -EINVAL;
>> + break;
>> + };
>> +
>> + mutex_unlock(&tr->mutex);
>> + return ret;
>> +}
>> +#endif
>> +
>> bool bpf_prog_has_trampoline(const struct bpf_prog *prog)
>> {
>> enum bpf_attach_type eatype = prog->expected_attach_type;
>
> Note that I did not do proper review. I not much familiar with the
> ftrace code. I just wanted to check how much this patchset affects
> livepatching and noticed the commented things.
Before this set, livepatch and bpf trampoline cannot work on the same
function. Whichever applied latter will fail. After this, the two
will work almost perfectly together. By "almost" I mean the race
condition protected by the mutex_trylock:
if (!mutex_trylock(&tr->mutex)) {
/* sleep 1 ms to make sure whatever holding tr->mutex makes
* some progress.
*/
msleep(1);
return -EAGAIN;
}
If livepatch is applied while something is making changes to the bpf
trampoline at the same time, livepatch code will get -EAGAIN from
register_ftrace_function(). Then we need to retry from livepatch side.
AFAICT, kpatch user space already does the retry, so it gonna work
as-is. I am not sure about other user space though.
The msleep(1) is suggested by Steven to avoid deadlock in RT case [1].
Does this make sense?
Thanks,
Song
[1] https://lore.kernel.org/bpf/20220706211858.67f9254d@xxxxxxxxxxxxxxxxxxxx/