Re: [PATCH] tracing: Fix printing ptrs in preempt/irq enable/disable events

From: Antonio Borneo
Date: Fri Dec 06 2019 - 19:01:52 EST


On Wed, 2019-12-04 at 11:04 -0500, Joel Fernandes wrote:
> On Wed, Nov 27, 2019 at 04:44:28PM +0100, Antonio Borneo wrote:
> > This tracing event class is the only instance in kernel that logs
> > in the trace buffer the instruction pointer as offset to _stext,
> > instead of logging the full pointer.
> > This looks like a nice optimization for 64 bits platforms, where a
> > 32 bit offset can take less space than a full 64 bits pointer. But
> > the symbol _stext is incorrectly resolved as zero in the expansion
>
> I didn't get this. If _stext is 0 on any platform, then your patch doesn't
> really do anything because the offset will be equal to the ip.
>
> Could you provide an example with real numbers showing the overflow?
>

Thanks for your reply.

Easy to replicate, e.g. with a vanilla v5.4 on x86_64
make x86_64_defconfig
echo CONFIG_PREEMPTIRQ_EVENTS=y >> .config
make oldconfig; make

At login:
trace-cmd record -e preemptirq:irq_enable sleep 1
trace-cmd report trace.dat | head
and I incorrectly get only the offsets to _stext:
cpus=1
sleep-197 [000] 280.378022: irq_enable: caller=0x1afaS parent=0xc00091S
sleep-197 [000] 280.378026: irq_enable: caller=0x54d34S parent=0x78e00000S
sleep-197 [000] 280.378029: irq_enable: caller=0x1afaS parent=0xc00a84S
sleep-197 [000] 280.378032: irq_enable: caller=0x263dS parent=0x78e00000S
...

With this patch I get the expected/right result:
cpus=1
sleep-210 [000] 525.177583: irq_enable: caller=trace_hardirqs_on_thunk parent=entry_SYSCALL_64_after_hwframe
sleep-210 [000] 525.177587: irq_enable: caller=__do_page_fault parent=(nil)S
sleep-210 [000] 525.177590: irq_enable: caller=trace_hardirqs_on_thunk parent=swapgs_restore_regs_and_return_to_usermode
sleep-210 [000] 525.177593: irq_enable: caller=do_syscall_64 parent=(nil)S
...

The same correct result can be obtained by hardcoding _stext in TP_printk() (of course, disabling KASLR to use fixed _stext), e.g. on x86_64:
TP_printk("caller=%pS parent=%pS",
- (void *)((unsigned long)(_stext) + __entry->caller_offs),
- (void *)((unsigned long)(_stext) + __entry->parent_offs))
+ (void *)(0xffffffff81000000UL + __entry->caller_offs),
+ (void *)(0xffffffff81000000UL + __entry->parent_offs))
);

The problem is caused by having the macros TP_fast_assign() and TP_printk() expanded in different way.
The simpler expansion of the former macro resolves correctly the symbol _stext.
The more complex expansion of the latter macro passes through some stringification that at the end incorrectly expands _stext as 0.
At first I thought to fix the macro expansion of TP_printk(), but then I realized that:
- on 32 bit systems there is no gain in coding 32 bit offset instead of 32 bit pointer
- on 64 bit systems I cannot find a constraint that guarantees that modules are loaded within 32 bit offset from _stext
then, this patch!

> > of TP_printk(), which then prints only the hex offset instead of
> > the name of the caller function. Plus, on arm arch the kernel
> > modules are loaded at address lower than _stext, causing the u32
> > offset arithmetics to overflow and wrap at 32 bits.
>
> If we use signed 32-bit, will that solve the module issue?
>

Of course, on arm 32 bit using a signed offset would solve the 32 bit wraparound.

> > I did not identified a 64 bit arch where the modules are loaded at
> > offset from _stext that exceed u32 range, but I also did not
> > identified any constraint to feel safe with a u32 offset.
> >
> > Log directly the instruction pointer instead of the offset to
> > _stext.
>
> I am not comfortable with this patch at the moment, mainly because it will
> increase the size of this rather high frequency event. But I'm not saying
> there isn't an issue on 32-bit. Let's discuss more.
>

Sure, the pressure on the trace buffer will increase on a 64 bit system, but in general it also has more memory than an embedded 32 bit system.
The key point is, instead, can we feel safe with a 32 bit offset in a 64 bit system? Is there any guarantee a module is loaded within 32 bits offset from _stext?

Antonio

> thanks,
>
> - Joel
>
>
> > Signed-off-by: Antonio Borneo <antonio.borneo@xxxxxx>
> > Fixes: d59158162e03 ("tracing: Add support for preempt and irq enable/disable events")
> > ---
> > include/trace/events/preemptirq.h | 12 ++++++------
> > 1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/trace/events/preemptirq.h b/include/trace/events/preemptirq.h
> > index 95fba0471e5b..d548a6aafa18 100644
> > --- a/include/trace/events/preemptirq.h
> > +++ b/include/trace/events/preemptirq.h
> > @@ -18,18 +18,18 @@ DECLARE_EVENT_CLASS(preemptirq_template,
> > TP_ARGS(ip, parent_ip),
> >
> > TP_STRUCT__entry(
> > - __field(u32, caller_offs)
> > - __field(u32, parent_offs)
> > + __field(unsigned long, caller_ip)
> > + __field(unsigned long, parent_ip)
> > ),
> >
> > TP_fast_assign(
> > - __entry->caller_offs = (u32)(ip - (unsigned long)_stext);
> > - __entry->parent_offs = (u32)(parent_ip - (unsigned long)_stext);
> > + __entry->caller_ip = ip;
> > + __entry->parent_ip = parent_ip;
> > ),
> >
> > TP_printk("caller=%pS parent=%pS",
> > - (void *)((unsigned long)(_stext) + __entry->caller_offs),
> > - (void *)((unsigned long)(_stext) + __entry->parent_offs))
> > + (void *)__entry->caller_ip,
> > + (void *)__entry->parent_ip)
> > );
> >
> > #ifdef CONFIG_TRACE_IRQFLAGS
> > --
> > 2.24.0
> >