Re: [RFC PATCH v2] trace: Add trace any kernel object

From: Huan Xie
Date: Tue Oct 26 2021 - 11:51:51 EST


On Tue, Oct 26, 2021 at 9:52 PM Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
>
> Hi Jeff,
>
> On Fri, 22 Oct 2021 18:07:52 -0400
> Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> > On Fri, 22 Oct 2021 02:53:35 +0800
> > Jeff Xie <xiehuan09@xxxxxxxxx> wrote:
> >
> > Hi Jeff!
> >
> >
> > > Introduce a method based on function tracer and kprobe/uprobe/trace_event/
> > > to trace any object in the linux kernel.
> > >
> > > Usage:
> > > When using kprobe/uprobe/trace_event/ we can use a new trigger(objfilter)
> > > to set object and tigger object trace.
> > >
> > > For example:
> > >
> > > For the function bio_add_page, we can trace the first argument:
> > >
> > > int bio_add_page(struct bio *bio, struct page *page,
> > > unsigned int len, unsigned int offset)
> > >
> > > [root@JeffXie ]# cd /sys/kernel/debug/tracing/
> > > [root@JeffXie tracing]# echo 'p bio_add_page arg1=$arg1' > kprobe_events
> > > [root@JeffXie tracing]# cd events/kprobes/p_bio_add_page_0/
> > > [root@JeffXie p_bio_add_page_0]# echo 'objfilter:arg1 if comm == "cat"' > ./trigger
>
> thanks for updating to use the trigger action!
>
> BTW, can you add a command to the objfilter syntax?
> I think sometimes we will trace the object in between some events. In that case,
> at the entrance event,
>
> objfilter:add:arg1
>
> and the exit event
>
> objfilter:del:arg1
>
> then user will only see the trace between the entrance and exit events.
>
> Also, please add me to Cc list :-)
>
>
> > > [root@JeffXie p_bio_add_page_0]# cat /test.txt
> > > [root@JeffXie p_bio_add_page_0]# cd /sys/kernel/debug/tracing/
> > > [root@JeffXie tracing]# cat ./trace
> > > # tracer: nop
> > > #
> > > # entries-in-buffer/entries-written: 81/81 #P:4
> > > #
> > > # _-----=> irqs-off
> > > # / _----=> need-resched
> > > # | / _---=> hardirq/softirq
> > > # || / _--=> preempt-depth
> > > # ||| / _-=> migrate-disable
> > > # |||| / delay
> > > # TASK-PID CPU# ||||| TIMESTAMP FUNCTION
> > > # | | | ||||| | |
> > > cat-122 [001] ..... 111.193997: bio_add_page <-ext4_mpage_readpages object:0xffff888102a4b900
> > > cat-122 [001] ..... 111.193998: __bio_try_merge_page <-bio_add_page object:0xffff888102a4b900
> > > cat-122 [001] ..... 111.193998: __bio_add_page <-bio_add_page object:0xffff888102a4b900
> > > cat-122 [001] ..... 111.193998: submit_bio <-ext4_mpage_readpages object:0xffff888102a4b900
> > > cat-122 [001] ..... 111.193998: submit_bio_noacct <-ext4_mpage_readpages object:0xffff888102a4b900
> > > cat-122 [001] ..... 111.193999: __submit_bio <-submit_bio_noacct object:0xffff888102a4b900
> > > cat-122 [001] ..... 111.193999: submit_bio_checks <-__submit_bio object:0xffff888102a4b900
> > > cat-122 [001] ..... 111.193999: __cond_resched <-submit_bio_checks object:0xffff888102a4b900
> > > cat-122 [001] ..... 111.193999: rcu_all_qs <-__cond_resched object:0xffff888102a4b900
> > > cat-122 [001] ..... 111.194000: should_fail_bio <-submit_bio_checks object:0xffff888102a4b900
> > > cat-122 [001] ..... 111.194001: blk_mq_submit_bio <-__submit_bio object:0xffff888102a4b900
> > > cat-122 [001] ..... 111.194001: blk_attempt_plug_merge <-blk_mq_submit_bio object:0xffff888102a4b900
> > > cat-122 [001] ..... 111.194001: __blk_mq_sched_bio_merge <-blk_mq_submit_bio object:0xffff888102a4b900
> > > cat-122 [001] ..... 111.194002: __blk_mq_alloc_request <-blk_mq_submit_bio object:0xffff888102a4b900
> > > cat-122 [001] ..... 111.194002: blk_mq_get_tag <-__blk_mq_alloc_request object:0xffff888102a4b900
> > > cat-122 [001] ..... 111.194003: blk_account_io_start <-blk_mq_submit_bio object:0xffff888102a4b900
> > > <idle>-0 [001] d.... 111.194212: bio_advance <-blk_update_request object:0xffff888102a4b900
> > > <idle>-0 [001] d.... 111.194213: bio_endio <-blk_update_request object:0xffff888102a4b900
> > > <idle>-0 [001] d.... 111.194213: mpage_end_io <-blk_update_request object:0xffff888102a4b900
> > > <idle>-0 [001] d.... 111.194213: __read_end_io <-blk_update_request object:0xffff888102a4b900
> > > <idle>-0 [001] d.... 111.194218: bio_put <-blk_update_request object:0xffff888102a4b900
> > > <idle>-0 [001] d.... 111.194218: bio_free <-blk_update_request object:0xffff888102a4b900
> > > <idle>-0 [001] d.... 111.194218: bio_free <-blk_update_request object:0xffff888102a4b900
> > > <idle>-0 [001] d.... 111.194218: bvec_free <-bio_free object:0xffff888102a4b900
> > > <idle>-0 [001] d.... 111.194219: mempool_free <-blk_update_request object:0xffff888102a4b900
> > > <idle>-0 [001] d.... 111.194219: mempool_free <-blk_update_request object:0xffff888102a4b900
> > > <idle>-0 [001] d.... 111.194219: mempool_free_slab <-blk_update_request object:0xffff888102a4b900
> > > <idle>-0 [001] d.... 111.194219: mempool_free_slab <-blk_update_request object:0xffff888102a4b900
> > > <idle>-0 [001] d.... 111.194219: kmem_cache_free <-blk_update_request object:0xffff888102a4b900
> > > <idle>-0 [001] d.... 111.194219: kmem_cache_free <-blk_update_request object:0xffff888102a4b900
> > >
> > > Signed-off-by: Jeff Xie <xiehuan09@xxxxxxxxx>
> > > ---
> > > So far, it can trace the object from kprobe/uprobe/syscall tracepoint
> > > and now it doesn't yet support the count for objfilter.
>
> That's OK, you can add features step by step (and that is preferrable)

Thank you so much for your encouragement!

>
> > >
> > > include/linux/ftrace.h | 16 +++
> > > include/linux/trace_events.h | 1 +
> > > kernel/trace/Kconfig | 7 +
> > > kernel/trace/Makefile | 1 +
> > > kernel/trace/trace.h | 7 +
> > > kernel/trace/trace_entries.h | 17 +++
> > > kernel/trace/trace_events_trigger.c | 122 ++++++++++++++++++
> > > kernel/trace/trace_kprobe.c | 1 +
> > > kernel/trace/trace_object.c | 191 ++++++++++++++++++++++++++++
> > > kernel/trace/trace_output.c | 40 ++++++
> > > kernel/trace/trace_syscalls.c | 1 +
> > > kernel/trace/trace_uprobe.c | 1 +
> > > 12 files changed, 405 insertions(+)
> > > create mode 100644 kernel/trace/trace_object.c
> > >
> > > diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> > > index 832e65f06754..2d7e62bff6c5 100644
> > > --- a/include/linux/ftrace.h
> > > +++ b/include/linux/ftrace.h
> > > @@ -1078,4 +1078,20 @@ unsigned long arch_syscall_addr(int nr);
> > >
> > > #endif /* CONFIG_FTRACE_SYSCALLS */
> > >
> > > +struct trace_event_file;
> > > +
> > > +#ifdef CONFIG_TRACE_OBJECT
> > > +int init_trace_object(void);
> > > +int exit_trace_object(void);
> > > +void set_trace_object(void *obj);
> > > +void record_trace_object(struct trace_event_file *trace_file,
> > > + struct pt_regs *regs);
> > > +#else
> > > +static inline int init_trace_object(void) { return 0; }
> > > +static inline int exit_trace_object(void) { return 0; }
> > > +static inline void set_trace_object(void *obj) { }
> > > +static inline void record_trace_object(struct trace_event_file *trace_file,
> > > + struct pt_regs *regs) { }
> > > +#endif /* CONFIG_TRACE_OBJECT */
> > > +
> > > #endif /* _LINUX_FTRACE_H */
> > > diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> > > index 3e475eeb5a99..f6dcaca9c3fe 100644
> > > --- a/include/linux/trace_events.h
> > > +++ b/include/linux/trace_events.h
> > > @@ -684,6 +684,7 @@ enum event_trigger_type {
> > > ETT_EVENT_HIST = (1 << 4),
> > > ETT_HIST_ENABLE = (1 << 5),
> > > ETT_EVENT_EPROBE = (1 << 6),
> > > + ETT_TRACE_OBJECT = (1 << 7),
> > > };
> > >
> > > extern int filter_match_preds(struct event_filter *filter, void *rec);
> > > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> > > index 420ff4bc67fd..d302a0f085f9 100644
> > > --- a/kernel/trace/Kconfig
> > > +++ b/kernel/trace/Kconfig
> > > @@ -237,6 +237,13 @@ config FUNCTION_PROFILER
> > >
> > > If in doubt, say N.
> > >
> > > +config TRACE_OBJECT
> > > + bool "Trace kernel object"
> > > + depends on FUNCTION_TRACER
> > > + default y
> > > + help
> > > + This help kernel developer to trace any kernel object.
>
> It is better to add more comment for this feature.
>
> Could you also add another patch to update the trigger documentation
> (or add a new document) under Documents/trace for this feature?

Okay, I will finish this part next.

> > > +
> > > config STACK_TRACER
> > > bool "Trace max stack"
> > > depends on HAVE_FUNCTION_TRACER
> > > diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> > > index 6de5d4d63165..6d9e78a488aa 100644
> > > --- a/kernel/trace/Makefile
> > > +++ b/kernel/trace/Makefile
> > > @@ -66,6 +66,7 @@ obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += trace_functions_graph.o
> > > obj-$(CONFIG_TRACE_BRANCH_PROFILING) += trace_branch.o
> > > obj-$(CONFIG_BLK_DEV_IO_TRACE) += blktrace.o
> > > obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += fgraph.o
> > > +obj-$(CONFIG_TRACE_OBJECT) += trace_object.o
> > > ifeq ($(CONFIG_BLOCK),y)
> > > obj-$(CONFIG_EVENT_TRACING) += blktrace.o
> > > endif
> > > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> > > index b7c0f8e160fb..6ac8f5aca070 100644
> > > --- a/kernel/trace/trace.h
> > > +++ b/kernel/trace/trace.h
> > > @@ -49,6 +49,7 @@ enum trace_type {
> > > TRACE_TIMERLAT,
> > > TRACE_RAW_DATA,
> > > TRACE_FUNC_REPEATS,
> > > + TRACE_OBJECT,
> > >
> > > __TRACE_LAST_TYPE,
> > > };
> > > @@ -460,6 +461,7 @@ extern void __ftrace_bad_type(void);
> > > TRACE_GRAPH_RET); \
> > > IF_ASSIGN(var, ent, struct func_repeats_entry, \
> > > TRACE_FUNC_REPEATS); \
> > > + IF_ASSIGN(var, ent, struct trace_object_entry, TRACE_OBJECT);\
> > > __ftrace_bad_type(); \
> > > } while (0)
> > >
> > > @@ -1950,6 +1952,11 @@ struct trace_min_max_param {
> > > u64 *max;
> > > };
> > >
> > > +struct object_trigger_param {
> > > + struct pt_regs *regs;
> > > + int param;
> > > +};
> > > +
> > > #define U64_STR_SIZE 24 /* 20 digits max */
> > >
> > > extern const struct file_operations trace_min_max_fops;
> > > diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
> > > index cd41e863b51c..bb120d9498a9 100644
> > > --- a/kernel/trace/trace_entries.h
> > > +++ b/kernel/trace/trace_entries.h
> > > @@ -401,3 +401,20 @@ FTRACE_ENTRY(timerlat, timerlat_entry,
> > > __entry->context,
> > > __entry->timer_latency)
> > > );
> > > +
> > > +/*
> > > + * trace object entry:
> > > + */
> > > +FTRACE_ENTRY(object, trace_object_entry,
> > > +
> > > + TRACE_OBJECT,
> > > +
> > > + F_STRUCT(
> > > + __field( unsigned long, ip )
> > > + __field( unsigned long, parent_ip )
> > > + __field( unsigned long, object )
> > > + ),
> > > +
> > > + F_printk(" %ps <-- %ps object:%lx\n",
> > > + (void *)__entry->ip, (void *)__entry->parent_ip, __entry->object)
> > > +);
> > > diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
> > > index 3d5c07239a2a..d823854445e6 100644
> > > --- a/kernel/trace/trace_events_trigger.c
> > > +++ b/kernel/trace/trace_events_trigger.c
> >
> > OK, so you shouldn't be modifying this file.
> >
> > Look at how the histograms are done:
> >
> > kernel/trace/trace_events_hist.c
> >
> > All your code should be in your trace_object.c file.
> >
> > Although, looking at the trace_events_hist.c file myself, I see a lot of
> > duplicate code in event_hist_trigger_func() where we can add helper
> > functions to handle that.
> >
> > But for now, just duplicate the code ;-)
> >
> > I added Tom to the Cc, so that he knows where he needs to clean up his code!
> >
> >
> > > @@ -756,12 +756,36 @@ int set_trigger_filter(char *filter_str,
> > > struct event_filter *filter = NULL, *tmp;
> > > int ret = -EINVAL;
> > > char *s;
> > > + unsigned long param;
> > > + struct object_trigger_param *obj_param;
> > > + int len;
> > >
> > > if (!filter_str) /* clear the current filter */
> > > goto assign;
> > >
> > > s = strsep(&filter_str, " \t");
> > >
> > > + if (data->cmd_ops->trigger_type == ETT_TRACE_OBJECT) {
> > > + obj_param = data->private_data;
> > > + len = str_has_prefix(s, "arg");
>
> This seems that you only accept the "argN" instead of the parameter name.
> kprobes event automatically use such name, but the normal events usually
> have named parameters.
> You can use trace_find_event_field(file->event_call, s) to find the field
> by name, instead of using argN. And you can record the field offset and size.
>
> > > + if (!len)
> > > + goto out;
> > > + ret = kstrtoul(s + len, 10, &param);
> > > + if (ret || param == 0)
> > > + goto out;
> > > + obj_param = kmalloc(sizeof(*obj_param), GFP_KERNEL);
> > > + if (!obj_param) {
> > > + ret = -ENOMEM;
> > > + goto out;
> > > + }
> > > + data->private_data = obj_param;
> > > + obj_param->param = param;
> > > + init_trace_object();
> > > + }
> > > + if (!strlen(s) || !filter_str)
> > > + goto out;
> > > +
> > > + s = strsep(&filter_str, " \t");
> > > if (!strlen(s) || strcmp(s, "if") != 0)
> > > goto out;
> > >
> > > @@ -1679,6 +1703,103 @@ static __init int register_trigger_traceon_traceoff_cmds(void)
> > > return ret;
> > > }
> > >
> > > +#ifdef CONFIG_TRACE_OBJECT
> > > +
> > > +static void
> > > +trace_object_trigger(struct event_trigger_data *data,
> > > + struct trace_buffer *buffer, void *rec,
> > > + struct ring_buffer_event *event)
> > > +{
> > > +
> > > + struct object_trigger_param *obj_param = data->private_data;
> > > + unsigned long param;
> > > +
> > > + param = regs_get_kernel_argument(obj_param->regs, obj_param->param - 1);
> > > + set_trace_object((void *)param);
>
> What I meant was to get the object from the trace buffer, not from
> the pt_regs. Using this regs_get_kernel_argument() will narrow the objfilter
> (or "objtrace", I reconsider the name) only for the function entry.
>
> But if you get the object parameter from the recorded event, your tracer
> can trace the object which is recorded by the embedded (static) events too.
> That will make this feature more useful :)
>
> Using the size and the offset, you can get the data from the @rec parameter
> (@rec is the binary data and you can find the field by the offset).

I sounds very great, I will take some time to understand all your
suggestions in depth ;-)

> Thank you,
> >
>
>
> --
> Masami Hiramatsu <mhiramat@xxxxxxxxxx>

--
JeffXie