Re: [PATCH net-next] net: Optimize IPv6 path in ip_neigh_for_gw()

From: Breno Leitao
Date: Tue Oct 08 2024 - 10:09:20 EST


Hello Paolo,

On Tue, Oct 08, 2024 at 12:51:05PM +0200, Paolo Abeni wrote:
> On 10/4/24 19:37, Breno Leitao wrote:
> > On Fri, Oct 04, 2024 at 11:01:29AM -0600, David Ahern wrote:
> > > On 10/4/24 10:27 AM, Breno Leitao wrote:
> > > > Branch annotation traces from approximately 200 IPv6-enabled hosts
> > > > revealed that the 'likely' branch in ip_neigh_for_gw() was consistently
> > > > mispredicted. Given the increasing prevalence of IPv6 in modern networks,
> > > > this commit adjusts the function to favor the IPv6 path.
> > > >
> > > > Swap the order of the conditional statements and move the 'likely'
> > > > annotation to the IPv6 case. This change aims to improve performance in
> > > > IPv6-dominant environments by reducing branch mispredictions.
> > > >
> > > > This optimization aligns with the trend of IPv6 becoming the default IP
> > > > version in many deployments, and should benefit modern network
> > > > configurations.
> > > >
> > > > Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx>
> > > > ---
> > > > include/net/route.h | 6 +++---
> > > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/include/net/route.h b/include/net/route.h
> > > > index 1789f1e6640b..b90b7b1effb8 100644
> > > > --- a/include/net/route.h
> > > > +++ b/include/net/route.h
> > > > @@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt,
> > > > struct net_device *dev = rt->dst.dev;
> > > > struct neighbour *neigh;
> > > > - if (likely(rt->rt_gw_family == AF_INET)) {
> > > > - neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> > > > - } else if (rt->rt_gw_family == AF_INET6) {
> > > > + if (likely(rt->rt_gw_family == AF_INET6)) {
> > > > neigh = ip_neigh_gw6(dev, &rt->rt_gw6);
> > > > *is_v6gw = true;
> > > > + } else if (rt->rt_gw_family == AF_INET) {
> > > > + neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> > > > } else {
> > > > neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr);
> > > > }
> > >
> > > This is an IPv4 function allowing support for IPv6 addresses as a
> > > nexthop. It is appropriate for IPv4 family checks to be first.
> >
> > Right. In which case is this called on IPv6 only systems?
> >
> > On my IPv6-only 200 systems, the annotated branch predictor is showing
> > it is mispredicted 100% of the time.
>
> perf probe -a ip_neigh_for_gw; perf record -e probe:ip_neigh_for_gw -ag;
> perf script
>
> should give you an hint.

Thanks. That proved to be very useful.

As I said above, all the hosts I have a webserver running, I see this
that likely mispredicted. Same for this server:

# cat /sys/kernel/tracing/trace_stat/branch_annotated | grep ip_neigh_for_gw
correct incorrect % Function File Line
0 17127 100 ip_neigh_for_gw route.h 393

It is mostly coming from ip_finish_output2() and tcp_v4. Important to
say that these machine has no IPv4 configured, except 127.0.0.1
(localhost).

Output of `perf script`:

curl 3284017 [020] 342043.646674: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])

curl 3284017 [020] 342043.646720: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_unicast_reply+0x3fc
tcp_v4_send_reset+0x668
tcp_v4_rcv+0xcdf
ip_protocol_deliver_rcu+0x10e
ip_local_deliver_finish+0x97
ip_local_deliver+0x43
ip_rcv+0x35
process_backlog+0x1b8
__napi_poll+0x30
net_rx_action+0x180
__kprobes_text_end+0xf2
__local_bh_enable_ip+0xeb
__dev_queue_xmit+0xc18
ip_finish_output2+0x63a
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])

isc-net-0000 3286356 [026] 342055.690384: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_skb+0x15
udp_send_skb+0xd2
udp_sendmsg+0xaa7
__x64_sys_sendmsg+0x338
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_sendmsg+0x4d (/usr/lib64/libc.so.6)

curl 3288713 [032] 342103.631991: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])

curl 3288713 [032] 342103.632039: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_unicast_reply+0x3fc
tcp_v4_send_reset+0x668
tcp_v4_rcv+0xcdf
ip_protocol_deliver_rcu+0x10e
ip_local_deliver_finish+0x97
ip_local_deliver+0x43
ip_rcv+0x35
process_backlog+0x1b8
__napi_poll+0x30
net_rx_action+0x180
__kprobes_text_end+0xf2
__local_bh_enable_ip+0xeb
__dev_queue_xmit+0xc18
ip_finish_output2+0x63a
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])

isc-net-0000 3289126 [021] 342115.725482: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_skb+0x15
udp_send_skb+0xd2
udp_sendmsg+0xaa7
__x64_sys_sendmsg+0x338
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_sendmsg+0x4d (/usr/lib64/libc.so.6)

curl 3291018 [030] 342163.627633: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])

curl 3291018 [030] 342163.627673: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_unicast_reply+0x3fc
tcp_v4_send_reset+0x668
tcp_v4_rcv+0xcdf
ip_protocol_deliver_rcu+0x10e
ip_local_deliver_finish+0x97
ip_local_deliver+0x43
ip_rcv+0x35
process_backlog+0x1b8
__napi_poll+0x30
net_rx_action+0x180
__kprobes_text_end+0xf2
__local_bh_enable_ip+0xeb
__dev_queue_xmit+0xc18
ip_finish_output2+0x63a
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])

isc-net-0000 3291256 [031] 342175.683527: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_skb+0x15
udp_send_skb+0xd2
udp_sendmsg+0xaa7
__x64_sys_sendmsg+0x338
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_sendmsg+0x4d (/usr/lib64/libc.so.6)

curl 3293421 [025] 342223.618198: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])

curl 3293421 [025] 342223.618239: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_unicast_reply+0x3fc
tcp_v4_send_reset+0x668
tcp_v4_rcv+0xcdf
ip_protocol_deliver_rcu+0x10e
ip_local_deliver_finish+0x97
ip_local_deliver+0x43
ip_rcv+0x35
process_backlog+0x1b8
__napi_poll+0x30
net_rx_action+0x180
__kprobes_text_end+0xf2
__local_bh_enable_ip+0xeb
__dev_queue_xmit+0xc18
ip_finish_output2+0x63a
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])

isc-net-0000 3293659 [034] 342235.695019: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_skb+0x15
udp_send_skb+0xd2
udp_sendmsg+0xaa7
__x64_sys_sendmsg+0x338
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_sendmsg+0x4d (/usr/lib64/libc.so.6)

curl 3295399 [012] 342283.632642: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])

curl 3295399 [012] 342283.632691: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_unicast_reply+0x3fc
tcp_v4_send_reset+0x668
tcp_v4_rcv+0xcdf
ip_protocol_deliver_rcu+0x10e
ip_local_deliver_finish+0x97
ip_local_deliver+0x43
ip_rcv+0x35
process_backlog+0x1b8
__napi_poll+0x30
net_rx_action+0x180
__kprobes_text_end+0xf2
__local_bh_enable_ip+0xeb
__dev_queue_xmit+0xc18
ip_finish_output2+0x63a
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])

isc-net-0000 3295746 [001] 342295.712436: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_skb+0x15
udp_send_skb+0xd2
udp_sendmsg+0xaa7
__x64_sys_sendmsg+0x338
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_sendmsg+0x4d (/usr/lib64/libc.so.6)

curl 3298603 [020] 342343.608814: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])

curl 3298603 [020] 342343.608858: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_unicast_reply+0x3fc
tcp_v4_send_reset+0x668
tcp_v4_rcv+0xcdf
ip_protocol_deliver_rcu+0x10e
ip_local_deliver_finish+0x97
ip_local_deliver+0x43
ip_rcv+0x35
process_backlog+0x1b8
__napi_poll+0x30
net_rx_action+0x180
__kprobes_text_end+0xf2
__local_bh_enable_ip+0xeb
__dev_queue_xmit+0xc18
ip_finish_output2+0x63a
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])

isc-net-0000 3299252 [032] 342355.693816: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_skb+0x15
udp_send_skb+0xd2
udp_sendmsg+0xaa7
__x64_sys_sendmsg+0x338
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_sendmsg+0x4d (/usr/lib64/libc.so.6)

curl 3303255 [033] 342403.616685: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])

curl 3303255 [033] 342403.616729: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_unicast_reply+0x3fc
tcp_v4_send_reset+0x668
tcp_v4_rcv+0xcdf
ip_protocol_deliver_rcu+0x10e
ip_local_deliver_finish+0x97
ip_local_deliver+0x43
ip_rcv+0x35
process_backlog+0x1b8
__napi_poll+0x30
net_rx_action+0x180
__kprobes_text_end+0xf2
__local_bh_enable_ip+0xeb
__dev_queue_xmit+0xc18
ip_finish_output2+0x63a
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])

isc-net-0000 3304989 [011] 342415.740580: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_skb+0x15
udp_send_skb+0xd2
udp_sendmsg+0xaa7
__x64_sys_sendmsg+0x338
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_sendmsg+0x4d (/usr/lib64/libc.so.6)

curl 3312952 [035] 342463.633808: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])

curl 3312952 [035] 342463.633859: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_unicast_reply+0x3fc
tcp_v4_send_reset+0x668
tcp_v4_rcv+0xcdf
ip_protocol_deliver_rcu+0x10e
ip_local_deliver_finish+0x97
ip_local_deliver+0x43
ip_rcv+0x35
process_backlog+0x1b8
__napi_poll+0x30
net_rx_action+0x180
__kprobes_text_end+0xf2
__local_bh_enable_ip+0xeb
__dev_queue_xmit+0xc18
ip_finish_output2+0x63a
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])

isc-net-0000 3314546 [032] 342475.766762: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_skb+0x15
udp_send_skb+0xd2
udp_sendmsg+0xaa7
__x64_sys_sendmsg+0x338
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_sendmsg+0x4d (/usr/lib64/libc.so.6)

curl 3321983 [006] 342523.654221: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])

curl 3321983 [006] 342523.654262: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_unicast_reply+0x3fc
tcp_v4_send_reset+0x668
tcp_v4_rcv+0xcdf
ip_protocol_deliver_rcu+0x10e
ip_local_deliver_finish+0x97
ip_local_deliver+0x43
ip_rcv+0x35
process_backlog+0x1b8
__napi_poll+0x30
net_rx_action+0x180
__kprobes_text_end+0xf2
__local_bh_enable_ip+0xeb
__dev_queue_xmit+0xc18
ip_finish_output2+0x63a
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])

isc-net-0000 3323932 [025] 342535.718587: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_skb+0x15
udp_send_skb+0xd2
udp_sendmsg+0xaa7
__x64_sys_sendmsg+0x338
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_sendmsg+0x4d (/usr/lib64/libc.so.6)