On Mon, Oct 9, 2023 at 11:43 AM Yajun Deng <yajun.deng@xxxxxxxxx> wrote:
C compiler decides to inline or not, depending on various factors.
On 2023/10/9 17:30, Eric Dumazet wrote:
On Mon, Oct 9, 2023 at 10:36 AM Yajun Deng <yajun.deng@xxxxxxxxx> wrote:
On 2023/10/9 16:20, Eric Dumazet wrote:I think you are very confused.
On Mon, Oct 9, 2023 at 10:14 AM Yajun Deng <yajun.deng@xxxxxxxxx> wrote:But somehow the following code isn't inline? They didn't need to add the
On 2023/10/9 15:53, Eric Dumazet wrote:The function was fine, you do not need anything like push or pop.
On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng <yajun.deng@xxxxxxxxx> wrote:Yes, you are right. It needs to add the 'noinline' prefix. The
'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will makeI do not think you made sure netdev_core_stats_inc() was never inlined.
the trace work well.
They all have 'pop' instructions in them. This may be the key to making
the trace work well.
Hi all,
I need your help on percpu and ftrace.
Adding more code in it is simply changing how the compiler decides to
inline or not.
disassembly code will have 'pop'
instruction.
The only needed stuff was the call __fentry__.
The fact that the function was inlined for some invocations was the
issue, because the trace point
is only planted in the out of line function.
'noinline' prefix.
+ field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
+ WRITE_ONCE(*field, READ_ONCE(*field) + 1);
Or
+ (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++;
You only want to trace netdev_core_stats_inc() entry point, not
arbitrary pieces of it.
Yes, I will trace netdev_core_stats_inc() entry point. I mean to replace
+ field = (__force unsigned long
__percpu *)((__force void *)p + offset);
+ this_cpu_inc(*field);
with
+ field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
+ WRITE_ONCE(*field, READ_ONCE(*field) + 1);
Or
+ (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++;
The netdev_core_stats_inc() entry point will work fine even if it doesn't
have 'noinline' prefix.
I don't know why this code needs to add 'noinline' prefix.
+ field = (__force unsigned long __percpu *)((__force void *)p + offset);
+ this_cpu_inc(*field);
The most efficient (and small) code is generated by this_cpu_inc()
version, allowing the compiler to inline it.
If you copy/paste this_cpu_inc() twenty times, then the compiler
would not inline the function anymore.