Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()

From: Yajun Deng
Date: Sun Oct 08 2023 - 04:52:17 EST

Next message: Jack Zhu: "[PATCH v10 8/8] media: staging: media: starfive: camss: Register devices"
Previous message: Sudeep Holla: "Re: [PATCH -next] firmware: arm_scmi: Remove unneeded semicolon"
In reply to: Eric Dumazet: "Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()"
Next in thread: Eric Dumazet: "Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2023/10/8 15:18, Eric Dumazet wrote:

On Sun, Oct 8, 2023 at 9:00 AM Yajun Deng <yajun.deng@xxxxxxxxx> wrote:

On 2023/10/8 14:45, Eric Dumazet wrote:

On Sat, Oct 7, 2023 at 8:34 AM Yajun Deng <yajun.deng@xxxxxxxxx> wrote:

On 2023/10/7 13:29, Eric Dumazet wrote:

On Sat, Oct 7, 2023 at 7:06 AM Yajun Deng <yajun.deng@xxxxxxxxx> wrote:

Although there is a kfree_skb_reason() helper function that can be used to
find the reason why this skb is dropped, but most callers didn't increase
one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped.

...

+
+void netdev_core_stats_inc(struct net_device *dev, u32 offset)
+{
+ /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
+ struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
+ unsigned long *field;
+
+ if (unlikely(!p))
+ p = netdev_core_stats_alloc(dev);
+
+ if (p) {
+ field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
+ WRITE_ONCE(*field, READ_ONCE(*field) + 1);

This is broken...

As I explained earlier, dev_core_stats_xxxx(dev) can be called from
many different contexts:

1) process contexts, where preemption and migration are allowed.
2) interrupt contexts.

Adding WRITE_ONCE()/READ_ONCE() is not solving potential races.

I _think_ I already gave you how to deal with this ?

Yes, I replied in v6.

https://lore.kernel.org/all/e25b5f3c-bd97-56f0-de86-b93a3172870d@xxxxxxxxx/

Please try instead:

+void netdev_core_stats_inc(struct net_device *dev, u32 offset)
+{
+ /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
+ struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
+ unsigned long __percpu *field;
+
+ if (unlikely(!p)) {
+ p = netdev_core_stats_alloc(dev);
+ if (!p)
+ return;
+ }
+ field = (__force unsigned long __percpu *)((__force void *)p + offset);
+ this_cpu_inc(*field);
+}

This wouldn't trace anything even the rx_dropped is in increasing. It
needs to add an extra operation, such as:

I honestly do not know what you are talking about.

Have you even tried to change your patch to use

field = (__force unsigned long __percpu *)((__force void *)p + offset);
this_cpu_inc(*field);

Yes, I tested this code. But the following couldn't show anything even
if the rx_dropped is increasing.

'sudo python3 /usr/share/bcc/tools/trace netdev_core_stats_inc'

Well, I am not sure about this, "bpftrace" worked for me.

Make sure your toolchain generates something that looks like what I got:

000000000000ef20 <netdev_core_stats_inc>:
ef20: f3 0f 1e fa endbr64
ef24: e8 00 00 00 00 call ef29 <netdev_core_stats_inc+0x9>
ef25: R_X86_64_PLT32 __fentry__-0x4
ef29: 55 push %rbp
ef2a: 48 89 e5 mov %rsp,%rbp
ef2d: 53 push %rbx
ef2e: 89 f3 mov %esi,%ebx
ef30: 48 8b 87 f0 01 00 00 mov 0x1f0(%rdi),%rax
ef37: 48 85 c0 test %rax,%rax
ef3a: 74 0b je ef47 <netdev_core_stats_inc+0x27>
ef3c: 89 d9 mov %ebx,%ecx
ef3e: 65 48 ff 04 08 incq %gs:(%rax,%rcx,1)
ef43: 5b pop %rbx
ef44: 5d pop %rbp
ef45: c3 ret
ef46: cc int3
ef47: e8 00 00 00 00 call ef4c <netdev_core_stats_inc+0x2c>
ef48: R_X86_64_PLT32 .text.unlikely.+0x13c
ef4c: 48 85 c0 test %rax,%rax
ef4f: 75 eb jne ef3c <netdev_core_stats_inc+0x1c>
ef51: eb f0 jmp ef43 <netdev_core_stats_inc+0x23>
ef53: 66 66 66 66 2e 0f 1f data16 data16 data16 cs nopw 0x0(%rax,%rax,1)
ef5a: 84 00 00 00 00 00

I'll share some I can see it.

1.

objdump -D vmlinux

ffffffff81b2f170 <netdev_core_stats_inc>:
ffffffff81b2f170:    e8 8b ea 55 ff           callq ffffffff8108dc00 <__fentry__>
ffffffff81b2f175:    55                       push   %rbp
ffffffff81b2f176:    48 89 e5                 mov    %rsp,%rbp
ffffffff81b2f179:    48 83 ec 08              sub    $0x8,%rsp
ffffffff81b2f17d:    48 8b 87 e8 01 00 00     mov 0x1e8(%rdi),%rax
ffffffff81b2f184:    48 85 c0                 test   %rax,%rax
ffffffff81b2f187:    74 0d                    je ffffffff81b2f196 <netdev_core_stats_inc+0x26>
ffffffff81b2f189:    89 f6                    mov    %esi,%esi
ffffffff81b2f18b:    65 48 ff 04 30           incq %gs:(%rax,%rsi,1)
ffffffff81b2f190:    c9                       leaveq
ffffffff81b2f191:    e9 aa 31 6d 00           jmpq ffffffff82202340 <__x86_return_thunk>
ffffffff81b2f196:    89 75 fc                 mov %esi,-0x4(%rbp)
ffffffff81b2f199:    e8 82 ff ff ff           callq ffffffff81b2f120 <netdev_core_stats_alloc>
ffffffff81b2f19e:    8b 75 fc                 mov -0x4(%rbp),%esi
ffffffff81b2f1a1:    48 85 c0                 test   %rax,%rax
ffffffff81b2f1a4:    75 e3                    jne ffffffff81b2f189 <netdev_core_stats_inc+0x19>
ffffffff81b2f1a6:    c9                       leaveq
ffffffff81b2f1a7:    e9 94 31 6d 00           jmpq ffffffff82202340 <__x86_return_thunk>
ffffffff81b2f1ac:    0f 1f 40 00              nopl   0x0(%rax)

2.

sudo cat /proc/kallsyms | grep netdev_core_stats_inc

ffffffff9c72f120 T netdev_core_stats_inc
ffffffff9ca2676c t netdev_core_stats_inc.cold
ffffffff9d5235e0 r __ksymtab_netdev_core_stats_inc

3.

➜ ~ ifconfig enp34s0f0
enp34s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 10.10.30.88 netmask 255.255.255.0 broadcast 10.10.30.255
        inet6 fe80::6037:806c:14b6:f1ca prefixlen 64 scopeid 0x20<link>
        ether 04:d4:c4:5c:81:42 txqueuelen 1000 (Ethernet)
        RX packets 29024 bytes 3118278 (3.1 MB)
        RX errors 0 dropped 794 overruns 0 frame 0
        TX packets 16961 bytes 2662290 (2.6 MB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
        device interrupt 29 memory 0x39fff4000000-39fff47fffff

➜ ~ ifconfig enp34s0f0
enp34s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 10.10.30.88 netmask 255.255.255.0 broadcast 10.10.30.255
        inet6 fe80::6037:806c:14b6:f1ca prefixlen 64 scopeid 0x20<link>
        ether 04:d4:c4:5c:81:42 txqueuelen 1000 (Ethernet)
        RX packets 29272 bytes 3148997 (3.1 MB)
        RX errors 0 dropped 798 overruns 0 frame 0
        TX packets 17098 bytes 2683547 (2.6 MB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
        device interrupt 29 memory 0x39fff4000000-39fff47fffff

The rx_dropped is increasing.

4.

sudo python3 /usr/share/bcc/tools/trace netdev_core_stats_inc

TIME     PID     TID     COMM            FUNC

(Empty, I didn't see anything.)

5.

sudo trace-cmd record -p function -l netdev_core_stats_inc

sudo trace-cmd report

(Empty, I didn't see anything.)

If I add a 'pr_info("\n");'   like:

+      pr_info("\n");
        field = (__force unsigned long __percpu *)((__force void *)p + offset);
        this_cpu_inc(*field);

Everything is OK. The 'pr_info("\n");' can be changed to anything else, but not

without it.

Next message: Jack Zhu: "[PATCH v10 8/8] media: staging: media: starfive: camss: Register devices"
Previous message: Sudeep Holla: "Re: [PATCH -next] firmware: arm_scmi: Remove unneeded semicolon"
In reply to: Eric Dumazet: "Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()"
Next in thread: Eric Dumazet: "Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]