Re: ftrace: Save module init functions kallsyms symbols for tracing

From: Steven Rostedt
Date: Mon Oct 09 2017 - 15:51:10 EST


On Fri, 6 Oct 2017 22:55:15 +0200
Jessica Yu <jeyu@xxxxxxxxxx> wrote:

> +++ Steven Rostedt [19/09/17 14:28 +0000]:
> >From: "Steven Rostedt (VMware)" <rostedt@xxxxxxxxxxx>
> >
> >If function tracing is active when the module init functions are freed, then
> >store them to be referenced by kallsyms. As module init functions can now be
> >traced on module load, they were useless:
> >
> > ># echo ':mod:snd_seq' > set_ftrace_filter
> > ># echo function > current_tracer
> > ># modprobe snd_seq
> > ># cat trace
> > # tracer: function
> > #
> > # _-----=> irqs-off
> > # / _----=> need-resched
> > # | / _---=> hardirq/softirq
> > # || / _--=> preempt-depth
> > # ||| / delay
> > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> > # | | | |||| | |
> > modprobe-2786 [000] .... 3189.037874: 0xffffffffa0860000 <-do_one_initcall
> > modprobe-2786 [000] .... 3189.037876: 0xffffffffa086004d <-0xffffffffa086000f
> > modprobe-2786 [000] .... 3189.037876: 0xffffffffa086010d <-0xffffffffa0860018
> > modprobe-2786 [000] .... 3189.037877: 0xffffffffa086011a <-0xffffffffa0860021
> > modprobe-2786 [000] .... 3189.037877: 0xffffffffa0860080 <-0xffffffffa086002a
> > modprobe-2786 [000] .... 3189.039523: 0xffffffffa0860400 <-0xffffffffa0860033
> > modprobe-2786 [000] .... 3189.039523: 0xffffffffa086038a <-0xffffffffa086041c
> > modprobe-2786 [000] .... 3189.039591: 0xffffffffa086038a <-0xffffffffa0860436
> > modprobe-2786 [000] .... 3189.039657: 0xffffffffa086038a <-0xffffffffa0860450
> > modprobe-2786 [000] .... 3189.039719: 0xffffffffa0860127 <-0xffffffffa086003c
> > modprobe-2786 [000] .... 3189.039742: snd_seq_create_kernel_client <-0xffffffffa08601f6
> >
> >When the output is shown, the kallsyms for the module init functions have
> >already been freed, and the output of the trace can not convert them to
> >their function names.
> >
> >Now this looks like this:
> >
> > # tracer: function
> > #
> > # _-----=> irqs-off
> > # / _----=> need-resched
> > # | / _---=> hardirq/softirq
> > # || / _--=> preempt-depth
> > # ||| / delay
> > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> > # | | | |||| | |
> > modprobe-2463 [002] .... 174.243237: alsa_seq_init <-do_one_initcall
> > modprobe-2463 [002] .... 174.243239: client_init_data <-alsa_seq_init
> > modprobe-2463 [002] .... 174.243240: snd_sequencer_memory_init <-alsa_seq_init
> > modprobe-2463 [002] .... 174.243240: snd_seq_queues_init <-alsa_seq_init
> > modprobe-2463 [002] .... 174.243240: snd_sequencer_device_init <-alsa_seq_init
> > modprobe-2463 [002] .... 174.244860: snd_seq_info_init <-alsa_seq_init
> > modprobe-2463 [002] .... 174.244861: create_info_entry <-snd_seq_info_init
> > modprobe-2463 [002] .... 174.244936: create_info_entry <-snd_seq_info_init
> > modprobe-2463 [002] .... 174.245003: create_info_entry <-snd_seq_info_init
> > modprobe-2463 [002] .... 174.245072: snd_seq_system_client_init <-alsa_seq_init
> > modprobe-2463 [002] .... 174.245094: snd_seq_create_kernel_client <-snd_seq_system_client_init
> >
> >Signed-off-by: Steven Rostedt (VMware) <rostedt@xxxxxxxxxxx>
>
> Hi Steven,
>
> Hm, I think there may be a way to make this easier and avoid all this
> copying and freeing of ftrace_mod* descriptors. It is trivially easy
> to save module init symbol information in the module's copy of the
> symbol table (mod->kallsyms). Currently the module loader just ignores
> symbols in init sections when building the module symtab. If we just
> make that information available, ftrace can easily walk the module's
> symtab to look for the init function symbol, its (former, before it
> was freed) address, and its name in the module's strtab. If this
> change is implemented we could probably omit patch 05 as well.
>
> This might look like the following (patch applies on top of patches 01-04)
> (not thoroughly tested yet!):

I just tried it. ;-)

This does simplify things, but it also causes some of the same issues
I've been trying to fix as well. But perhaps we can live with them.

I did:

# rmmod kvm_intel
# echo ":mod:kvm_intel" > set_ftrace_filter
# echo function > current_tracer
# modprobe kvm_intel
# rmmod ip6table_filter
# modprobe ip6table_filter
# cat trace
# tracer: function
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
modprobe-2856 [001] .... 15513.473966: vmx_init <-do_one_initcall
modprobe-2856 [001] .... 15513.473968: ip6table_filter_init <-kvm_arch_init
modprobe-2856 [001] .... 15513.473968: vmx_disabled_by_bios <-kvm_arch_init
modprobe-2856 [001] .... 15513.479503: hardware_setup <-kvm_arch_hardware_setup
modprobe-2856 [001] .... 15513.479514: setup_vmcs_config <-hardware_setup

Notice the "ip6table_filter_init" called by kvm_arch_init ;-)

I guess that's fine, as I don't have a solution for that either.

I did notice this though:

# grep ffffffffa0308000 /proc/kallsyms
ffffffffa0308000 t ip6table_filter_init [ip6table_filter]
ffffffffa0308000 t init_module [ip6table_filter]
ffffffffa0308000 t cpu_has_kvm_support [kvm_intel]
ffffffffa0308000 t generic_driver_init [snd_hda_codec_generic]
ffffffffa0308000 t init_module [snd_hda_codec_generic]

As long as the last module loaded is the first one that gets picked up
by kallsyms, we should be good.

As I finished all my tests, I want to still post the changes. But if
this proves to be a better method, I can just revert them.

-- Steve


>
> -->8--
> diff --git a/include/linux/module.h b/include/linux/module.h
> index fe5aa3736707..8a8bdf8397d3 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -541,6 +541,10 @@ const struct kernel_symbol *find_symbol(const char *name,
> bool gplok,
> bool warn);
>
> +const char *get_module_ksymbol(struct module *mod,
> + unsigned long addr,
> + unsigned long *size,
> + unsigned long *offset);
> /*
> * Walk the exported symbol table
> *
> diff --git a/kernel/module.c b/kernel/module.c
> index 279a469dc375..db85e47d4f2f 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2596,7 +2596,7 @@ static bool is_core_symbol(const Elf_Sym *src, const Elf_Shdr *sechdrs,
> #ifndef CONFIG_KALLSYMS_ALL
> || !(sec->sh_flags & SHF_EXECINSTR)
> #endif
> - || (sec->sh_entsize & INIT_OFFSET_MASK))
> + )
> return false;
>
> return true;
> @@ -3885,25 +3885,20 @@ static const char *symname(struct mod_kallsyms *kallsyms, unsigned int symnum)
> return kallsyms->strtab + kallsyms->symtab[symnum].st_name;
> }
>
> -static const char *get_ksymbol(struct module *mod,
> +const char *get_module_ksymbol(struct module *mod,
> unsigned long addr,
> unsigned long *size,
> unsigned long *offset)
> {
> - unsigned int i, best = 0;
> - unsigned long nextval;
> + unsigned int i, found = 0;
> struct mod_kallsyms *kallsyms = rcu_dereference_sched(mod->kallsyms);
> + Elf_Sym *sym, *symtab = kallsyms->symtab;
>
> - /* At worse, next value is at end of module */
> - if (within_module_init(addr, mod))
> - nextval = (unsigned long)mod->init_layout.base+mod->init_layout.text_size;
> - else
> - nextval = (unsigned long)mod->core_layout.base+mod->core_layout.text_size;
> -
> - /* Scan for closest preceding symbol, and next symbol. (ELF
> - starts real symbols at 1). */
> + /* Note: ELF symbols in a symbol table start at index 1). */
> for (i = 1; i < kallsyms->num_symtab; i++) {
> - if (kallsyms->symtab[i].st_shndx == SHN_UNDEF)
> + sym = &symtab[i];
> +
> + if (sym->st_shndx == SHN_UNDEF)
> continue;
>
> /* We ignore unnamed symbols: they're uninformative
> @@ -3912,22 +3907,30 @@ static const char *get_ksymbol(struct module *mod,
> || is_arm_mapping_symbol(symname(kallsyms, i)))
> continue;
>
> - if (kallsyms->symtab[i].st_value <= addr
> - && kallsyms->symtab[i].st_value > kallsyms->symtab[best].st_value)
> - best = i;
> - if (kallsyms->symtab[i].st_value > addr
> - && kallsyms->symtab[i].st_value < nextval)
> - nextval = kallsyms->symtab[i].st_value;
> + /*
> + * These are "internal" symbols aliased to the real init and
> + * exit functions in a module. But they're not particularly
> + * informative in oopses/tracing. Skip them and use the module's
> + * real init/exit functions instead.
> + */
> + if (strcmp(symname(kallsyms, i), "init_module") == 0 ||
> + strcmp(symname(kallsyms, i), "cleanup_module") == 0)
> + continue;
> +
> + if (sym->st_value <= addr &&
> + addr < sym->st_value + sym->st_size)
> + found = i;
> }
>
> - if (!best)
> + if (!found)
> return NULL;
>
> if (size)
> - *size = nextval - kallsyms->symtab[best].st_value;
> + *size = symtab[found].st_size;
> if (offset)
> - *offset = addr - kallsyms->symtab[best].st_value;
> - return symname(kallsyms, best);
> + *offset = addr - symtab[found].st_value;
> +
> + return symname(kallsyms, found);
> }
>
> /* For kallsyms to ask for address resolution. NULL means not found. Careful
> @@ -3946,7 +3949,7 @@ const char *module_address_lookup(unsigned long addr,
> if (mod) {
> if (modname)
> *modname = mod->name;
> - ret = get_ksymbol(mod, addr, size, offset);
> + ret = get_module_ksymbol(mod, addr, size, offset);
> }
> /* Make a copy in here where it's safe */
> if (ret) {
> @@ -3969,7 +3972,7 @@ int lookup_module_symbol_name(unsigned long addr, char *symname)
> if (within_module(addr, mod)) {
> const char *sym;
>
> - sym = get_ksymbol(mod, addr, NULL, NULL);
> + sym = get_module_ksymbol(mod, addr, NULL, NULL);
> if (!sym)
> goto out;
> strlcpy(symname, sym, KSYM_NAME_LEN);
> @@ -3994,7 +3997,7 @@ int lookup_module_symbol_attrs(unsigned long addr, unsigned long *size,
> if (within_module(addr, mod)) {
> const char *sym;
>
> - sym = get_ksymbol(mod, addr, size, offset);
> + sym = get_module_ksymbol(mod, addr, size, offset);
> if (!sym)
> goto out;
> if (modname)
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index f0af3573b808..c9862ddd3954 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -5675,20 +5675,12 @@ static int ftrace_process_locs(struct module *mod,
> return ret;
> }
>
> -struct ftrace_mod_func {
> - struct list_head list;
> - char *name;
> - unsigned long ip;
> - unsigned int size;
> -};
> -
> struct ftrace_mod_map {
> struct rcu_head rcu;
> struct list_head list;
> struct module *mod;
> unsigned long start_addr;
> unsigned long end_addr;
> - struct list_head funcs;
> };
>
> #ifdef CONFIG_MODULES
> @@ -5753,15 +5745,6 @@ static void clear_mod_from_hashes(struct ftrace_page *pg)
> static void ftrace_free_mod_map(struct rcu_head *rcu)
> {
> struct ftrace_mod_map *mod_map = container_of(rcu, struct ftrace_mod_map, rcu);
> - struct ftrace_mod_func *mod_func;
> - struct ftrace_mod_func *n;
> -
> - /* All the contents of mod_map are now not visible to readers */
> - list_for_each_entry_safe(mod_func, n, &mod_map->funcs, list) {
> - kfree(mod_func->name);
> - list_del(&mod_func->list);
> - kfree(mod_func);
> - }
>
> kfree(mod_map);
> }
> @@ -5912,36 +5895,6 @@ void ftrace_module_init(struct module *mod)
> mod->ftrace_callsites + mod->num_ftrace_callsites);
> }
>
> -static void save_ftrace_mod_rec(struct ftrace_mod_map *mod_map,
> - struct dyn_ftrace *rec)
> -{
> - struct ftrace_mod_func *mod_func;
> - unsigned long symsize;
> - unsigned long offset;
> - char str[KSYM_SYMBOL_LEN];
> - char *modname;
> - const char *ret;
> -
> - ret = kallsyms_lookup(rec->ip, &symsize, &offset, &modname, str);
> - if (!ret)
> - return;
> -
> - mod_func = kmalloc(sizeof(*mod_func), GFP_KERNEL);
> - if (!mod_func)
> - return;
> -
> - mod_func->name = kstrdup(str, GFP_KERNEL);
> - if (!mod_func->name) {
> - kfree(mod_func);
> - return;
> - }
> -
> - mod_func->ip = rec->ip - offset;
> - mod_func->size = symsize;
> -
> - list_add_rcu(&mod_func->list, &mod_map->funcs);
> -}
> -
> static struct ftrace_mod_map *
> allocate_ftrace_mod_map(struct module *mod,
> unsigned long start, unsigned long end)
> @@ -5956,43 +5909,11 @@ allocate_ftrace_mod_map(struct module *mod,
> mod_map->start_addr = start;
> mod_map->end_addr = end;
>
> - INIT_LIST_HEAD_RCU(&mod_map->funcs);
> -
> list_add_rcu(&mod_map->list, &ftrace_mod_maps);
>
> return mod_map;
> }
>
> -static const char *
> -ftrace_func_address_lookup(struct ftrace_mod_map *mod_map,
> - unsigned long addr, unsigned long *size,
> - unsigned long *off, char *sym)
> -{
> - struct ftrace_mod_func *found_func = NULL;
> - struct ftrace_mod_func *mod_func;
> -
> - list_for_each_entry_rcu(mod_func, &mod_map->funcs, list) {
> - if (addr >= mod_func->ip &&
> - addr < mod_func->ip + mod_func->size) {
> - found_func = mod_func;
> - break;
> - }
> - }
> -
> - if (found_func) {
> - if (size)
> - *size = found_func->size;
> - if (off)
> - *off = addr - found_func->ip;
> - if (sym)
> - strlcpy(sym, found_func->name, KSYM_NAME_LEN);
> -
> - return found_func->name;
> - }
> -
> - return NULL;
> -}
> -
> const char *
> ftrace_mod_address_lookup(unsigned long addr, unsigned long *size,
> unsigned long *off, char **modname, char *sym)
> @@ -6003,10 +5924,12 @@ ftrace_mod_address_lookup(unsigned long addr, unsigned long *size,
> /* mod_map is freed via call_rcu_sched() */
> preempt_disable();
> list_for_each_entry_rcu(mod_map, &ftrace_mod_maps, list) {
> - ret = ftrace_func_address_lookup(mod_map, addr, size, off, sym);
> + ret = get_module_ksymbol(mod_map->mod, addr, size, off);
> if (ret) {
> if (modname)
> *modname = mod_map->mod->name;
> + if (sym)
> + strlcpy(sym, ret, KSYM_NAME_LEN);
> break;
> }
> }
> @@ -6060,9 +5983,6 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
> if (!rec)
> continue;
>
> - if (mod_map)
> - save_ftrace_mod_rec(mod_map, rec);
> -
> pg->index--;
> ftrace_update_tot_cnt--;
> if (!pg->index) {
> -->8--
>
> Using the snd_seq example from patch 03, I get the same human-readable trace:
>
> # tracer: function
> #
> # _-----=> irqs-off
> # / _----=> need-resched
> # | / _---=> hardirq/softirq
> # || / _--=> preempt-depth
> # ||| / delay
> # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> # | | | |||| | |
> modprobe-12748 [005] .... 138.964923: alsa_seq_init <-do_one_initcall
> modprobe-12748 [005] .... 138.964924: client_init_data <-alsa_seq_init
> modprobe-12748 [005] .... 138.964924: snd_sequencer_memory_init <-alsa_seq_init
> modprobe-12748 [005] .... 138.964924: snd_seq_queues_init <-alsa_seq_init
> modprobe-12748 [005] .... 138.964924: snd_sequencer_device_init <-alsa_seq_init
> modprobe-12748 [005] .... 138.964956: snd_seq_info_init <-alsa_seq_init
> modprobe-12748 [005] .... 138.964956: create_info_entry <-snd_seq_info_init
> modprobe-12748 [005] .... 138.964958: create_info_entry <-snd_seq_info_init
> modprobe-12748 [005] .... 138.964959: create_info_entry <-snd_seq_info_init
> modprobe-12748 [005] .... 138.964960: snd_seq_system_client_init <-alsa_seq_init
> modprobe-12748 [005] .... 138.964960: snd_seq_create_kernel_client <-snd_seq_system_client_init
>
> In addition, the module init functions appear in /proc/kallsyms as
> well. Thoughts? Would saving init symbol information in the module
> symtab be sufficient for ftrace?
>
> Thanks,
>
> Jessica