Re: [PATCH bpf-next 1/2] bpf: Add a bpf_kallsyms_lookup helper

From: Andrii Nakryiko
Date: Tue Dec 01 2020 - 19:56:26 EST


On Fri, Nov 27, 2020 at 8:09 AM Yonghong Song <yhs@xxxxxx> wrote:
>
>
>
> On 11/27/20 3:20 AM, KP Singh wrote:
> > On Fri, Nov 27, 2020 at 8:35 AM Yonghong Song <yhs@xxxxxx> wrote:
> >>
> >>
> >>
> >> On 11/26/20 8:57 AM, Florent Revest wrote:
> >>> This helper exposes the kallsyms_lookup function to eBPF tracing
> >>> programs. This can be used to retrieve the name of the symbol at an
> >>> address. For example, when hooking into nf_register_net_hook, one can
> >>> audit the name of the registered netfilter hook and potentially also
> >>> the name of the module in which the symbol is located.
> >>>
> >>> Signed-off-by: Florent Revest <revest@xxxxxxxxxx>
> >>> ---
> >>> include/uapi/linux/bpf.h | 16 +++++++++++++
> >>> kernel/trace/bpf_trace.c | 41 ++++++++++++++++++++++++++++++++++
> >>> tools/include/uapi/linux/bpf.h | 16 +++++++++++++
> >>> 3 files changed, 73 insertions(+)
> >>>
> >>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >>> index c3458ec1f30a..670998635eac 100644
> >>> --- a/include/uapi/linux/bpf.h
> >>> +++ b/include/uapi/linux/bpf.h
> >>> @@ -3817,6 +3817,21 @@ union bpf_attr {
> >>> * The **hash_algo** is returned on success,
> >>> * **-EOPNOTSUP** if IMA is disabled or **-EINVAL** if
> >>> * invalid arguments are passed.
> >>> + *
> >>> + * long bpf_kallsyms_lookup(u64 address, char *symbol, u32 symbol_size, char *module, u32 module_size)
> >>> + * Description
> >>> + * Uses kallsyms to write the name of the symbol at *address*
> >>> + * into *symbol* of size *symbol_sz*. This is guaranteed to be
> >>> + * zero terminated.
> >>> + * If the symbol is in a module, up to *module_size* bytes of
> >>> + * the module name is written in *module*. This is also
> >>> + * guaranteed to be zero-terminated. Note: a module name
> >>> + * is always shorter than 64 bytes.
> >>> + * Return
> >>> + * On success, the strictly positive length of the full symbol
> >>> + * name, If this is greater than *symbol_size*, the written
> >>> + * symbol is truncated.
> >>> + * On error, a negative value.
> >>> */
> >>> #define __BPF_FUNC_MAPPER(FN) \
> >>> FN(unspec), \
> >>> @@ -3981,6 +3996,7 @@ union bpf_attr {
> >>> FN(bprm_opts_set), \
> >>> FN(ktime_get_coarse_ns), \
> >>> FN(ima_inode_hash), \
> >>> + FN(kallsyms_lookup), \
> >>> /* */
> >>>
> >>> /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> >>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> >>> index d255bc9b2bfa..9d86e20c2b13 100644
> >>> --- a/kernel/trace/bpf_trace.c
> >>> +++ b/kernel/trace/bpf_trace.c
> >>> @@ -17,6 +17,7 @@
> >>> #include <linux/error-injection.h>
> >>> #include <linux/btf_ids.h>
> >>> #include <linux/bpf_lsm.h>
> >>> +#include <linux/kallsyms.h>
> >>>
> >>> #include <net/bpf_sk_storage.h>
> >>>
> >>> @@ -1260,6 +1261,44 @@ const struct bpf_func_proto bpf_snprintf_btf_proto = {
> >>> .arg5_type = ARG_ANYTHING,
> >>> };
> >>>
> >>> +BPF_CALL_5(bpf_kallsyms_lookup, u64, address, char *, symbol, u32, symbol_size,
> >>> + char *, module, u32, module_size)
> >>> +{
> >>> + char buffer[KSYM_SYMBOL_LEN];
> >>> + unsigned long offset, size;
> >>> + const char *name;
> >>> + char *modname;
> >>> + long ret;
> >>> +
> >>> + name = kallsyms_lookup(address, &size, &offset, &modname, buffer);
> >>> + if (!name)
> >>> + return -EINVAL;
> >>> +
> >>> + ret = strlen(name) + 1;
> >>> + if (symbol_size) {
> >>> + strncpy(symbol, name, symbol_size);
> >>> + symbol[symbol_size - 1] = '\0';
> >>> + }
> >>> +
> >>> + if (modname && module_size) {
> >>> + strncpy(module, modname, module_size);
> >>> + module[module_size - 1] = '\0';
> >>
> >> In this case, module name may be truncated and user did not get any
> >> indication from return value. In the helper description, it is mentioned
> >> that module name currently is most 64 bytes. But from UAPI perspective,
> >> it may be still good to return something to let user know the name
> >> is truncated.
> >>
> >> I do not know what is the best way to do this. One suggestion is
> >> to break it into two helpers, one for symbol name and another
> >
> > I think it would be slightly preferable to have one helper though.
> > maybe something like bpf_get_symbol_info (better names anyone? :))
> > with flags to get the module name or the symbol name depending
> > on the flag?
>
> This works even better. Previously I am thinking if we have two helpers,
> we can add flags for each of them for future extension. But we
> can certainly have just one helper with flags to indicate
> whether this is for module name or for symbol name or something else.
>
> The buffer can be something like
> union bpf_ksymbol_info {
> char module_name[];
> char symbol_name[];
> ...
> }
> and flags will indicate what information user wants.

one more thing that might be useful to resolve to the symbol's "base
address". E.g., if we have IP inside the function, this would resolve
to the start of the function, sort of "canonical" symbol address. Type
of ksym is another "characteristic" which could be returned (as a
single char?)

I wouldn't define bpf_ksymbol_info, though. Just depending on the
flag, specify what kind of memory layou (e.g., for strings -
zero-terminated string, for address - 8 byte numbers, etc). That way
we can also allow fetching multiple things together, they would just
be laid out one after another in memory.

E.g.:

char buf[256];
int err = bpf_ksym_resolve(<addr>, BPF_KSYM_NAME | BPF_KSYM_MODNAME |
BPF_KSYM_BASE_ADDR, buf, sizeof(buf));

if (err == -E2BIG)
/* need bigger buffer, but all the data up to truncation point is filled in */
else
/* err has exact number of bytes used, including zero terminator(s) */
/* data is laid out as
"cpufreq_gov_powersave_init\0cpufreq_powersave\0\x12\x23\x45\x56\x12\x23\x45\x56"
*/


>
> >
> >> for module name. What is the use cases people want to get both
> >> symbol name and module name and is it common?
> >
> > The use case would be to disambiguate symbols in the
> > kernel from the ones from a kernel module. Similar to what
> > /proc/kallsyms does:
> >
> > T cpufreq_gov_powersave_init [cpufreq_powersave]
> >
> >>
> >>> + }
> >>> +
> >>> + return ret;
> >>> +}
> >>> +
> >>> +const struct bpf_func_proto bpf_kallsyms_lookup_proto = {
> >>> + .func = bpf_kallsyms_lookup,
> >>> + .gpl_only = false,
> >>> + .ret_type = RET_INTEGER,
> >>> + .arg1_type = ARG_ANYTHING,
> >>> + .arg2_type = ARG_PTR_TO_MEM,
> >> ARG_PTR_TO_UNINIT_MEM?
> >>
> >>> + .arg3_type = ARG_CONST_SIZE,
> >> ARG_CONST_SIZE_OR_ZERO? This is especially true for current format
> >> which tries to return both symbol name and module name and
> >> user may just want to do one of them.
> >>
> >>> + .arg4_type = ARG_PTR_TO_MEM,
> >> ARG_PTR_TO_UNINIT_MEM?
> >>
> >>> + .arg5_type = ARG_CONST_SIZE,
> >> ARG_CONST_SIZE_OR_ZERO?
> >>
> >>> +};
> >>> +
> >>> const struct bpf_func_proto *
> >>> bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> >>> {
> >>> @@ -1356,6 +1395,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> >>> return &bpf_per_cpu_ptr_proto;
> >>> case BPF_FUNC_bpf_this_cpu_ptr:
> >>> return &bpf_this_cpu_ptr_proto;
> >>> + case BPF_FUNC_kallsyms_lookup:
> >>> + return &bpf_kallsyms_lookup_proto;
> >>> default:
> >>> return NULL;
> >>> }
> >> [...]