Re: [PATCH net] udp: fix segmentation crash for untrusted source packet

From: Willem de Bruijn
Date: Sat Mar 16 2024 - 09:47:27 EST


Lena Wang (王娜) wrote:
> On Wed, 2024-03-13 at 16:41 +0100, Paolo Abeni wrote:
> >
> > External email : Please do not click links or open attachments until
> > you have verified the sender or the content.
> > On Wed, 2024-03-13 at 21:34 +0800, Shiming Cheng wrote:
> > > Kernel exception is reported when making udp frag list
> > segmentation.
> > > Backtrace is as below:
> > > at out/android15-6.6/kernel-6.6/kernel-
> > 6.6/net/ipv4/udp_offload.c:229
> > > at out/android15-6.6/kernel-6.6/kernel-
> > 6.6/net/ipv4/udp_offload.c:262
> > > features=features@entry=19, is_ipv6=false)
> > > at out/android15-6.6/kernel-6.6/kernel-
> > 6.6/net/ipv4/udp_offload.c:289
> > > features=19)
> > > at out/android15-6.6/kernel-6.6/kernel-
> > 6.6/net/ipv4/udp_offload.c:399
> > > features=19)
> > > at out/android15-6.6/kernel-6.6/kernel-
> > 6.6/net/ipv4/af_inet.c:1418
> > > skb@entry=0x0, features=19, features@entry=0)
> > > at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:53
> > > tx_path=<optimized out>)
> > > at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:124
> >
> > A full backtrace would help better understanding the issue.
>
> Below is full backtrace:
> [ 1100.812205][ C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted:
> G W OE 6.6.17-android15-0-g380371ea9bf1 #1
> [ 1100.812211][ C3] Hardware name: MT6991(ENG) (DT)
> [ 1100.812215][ C3] Call trace:
> [ 1100.812218][ C3] dump_backtrace+0xec/0x138
> [ 1100.812222][ C3] show_stack+0x18/0x24
> [ 1100.812226][ C3] dump_stack_lvl+0x50/0x6c
> [ 1100.812232][ C3] dump_stack+0x18/0x24
> [ 1100.812237][ C3] mrdump_common_die+0x24c/0x388 [mrdump]
> [ 1100.812259][ C3] ipanic_die+0x20/0x34 [mrdump]
> [ 1100.812269][ C3] notifier_call_chain+0x90/0x174
> [ 1100.812275][ C3] notify_die+0x50/0x8c
> [ 1100.812279][ C3] die+0x94/0x308
> [ 1100.812283][ C3] __do_kernel_fault+0x240/0x26c
> [ 1100.812288][ C3] do_page_fault+0xa0/0x48c
> [ 1100.812293][ C3] do_translation_fault+0x38/0x54
> [ 1100.812297][ C3] do_mem_abort+0x58/0x104
> [ 1100.812302][ C3] el1_abort+0x3c/0x5c
> [ 1100.812307][ C3] el1h_64_sync_handler+0x54/0x90
> [ 1100.812313][ C3] el1h_64_sync+0x68/0x6c
> [ 1100.812317][ C3] __udp_gso_segment+0x298/0x4d4
> [ 1100.812322][ C3] udp4_ufo_fragment+0x130/0x174
> [ 1100.812326][ C3] inet_gso_segment+0x164/0x330
> [ 1100.812330][ C3] skb_mac_gso_segment+0xc4/0x13c
> [ 1100.812335][ C3] __skb_gso_segment+0xc4/0x120
> [ 1100.812339][ C3] udp_rcv_segment+0x50/0x134
> [ 1100.812344][ C3] udp_queue_rcv_skb+0x74/0x114
> [ 1100.812348][ C3] udp_unicast_rcv_skb+0x94/0xac
> [ 1100.812353][ C3] __udp4_lib_rcv+0x3e0/0x818
> [ 1100.812358][ C3] udp_rcv+0x20/0x30
> [ 1100.812362][ C3] ip_protocol_deliver_rcu+0x194/0x368
> [ 1100.812368][ C3] ip_local_deliver+0xe4/0x184
> [ 1100.812373][ C3] ip_rcv+0x90/0x118
> [ 1100.812378][ C3] __netif_receive_skb+0x74/0x124
> [ 1100.812383][ C3] process_backlog+0xd8/0x18c
> [ 1100.812388][ C3] __napi_poll+0x5c/0x1fc
> [ 1100.812392][ C3] net_rx_action+0x150/0x334
> [ 1100.812397][ C3] __do_softirq+0x120/0x3f4
> [ 1100.812401][ C3] ____do_softirq+0x10/0x20
> [ 1100.812405][ C3] call_on_irq_stack+0x3c/0x74
> [ 1100.812410][ C3] do_softirq_own_stack+0x1c/0x2c
> [ 1100.812414][ C3] __irq_exit_rcu+0x5c/0xd4
> [ 1100.812418][ C3] irq_exit_rcu+0x10/0x1c
> [ 1100.812422][ C3] el1_interrupt+0x38/0x58
> [ 1100.812428][ C3] el1h_64_irq_handler+0x18/0x24
> [ 1100.812434][ C3] el1h_64_irq+0x68/0x6c
> [ 1100.812437][ C3] arch_local_irq_enable+0x4/0x8
> [ 1100.812443][ C3] cpuidle_enter+0x38/0x54
> [ 1100.812449][ C3] do_idle+0x198/0x294
> [ 1100.812454][ C3] cpu_startup_entry+0x34/0x3c
> [ 1100.812459][ C3] secondary_start_kernel+0x138/0x158
> [ 1100.812465][ C3] __secondary_switched+0xc0/0xc4
>
> > > This packet's frag list is null while gso_type is not 0. Then it is
> > treated
> > > as a GRO-ed packet and sent to segment frag list. Function call
> > path is
> > > udp_rcv_segment => config features value
> > > __udpv4_gso_segment => skb_gso_ok returns false. Here it
> > should be
> > > true.
> >
> > Why? If I read correctly the above, this is GSO packet landing in an
> > UDP socket with no UDP_GRO sockopt. The packet is expected to be
> > segmented again.
> >
> Yes, it is GSO packet, however the fragment list of this GSO packet
> becomes NULL. As the occurrence rate is very low, we really don’t know
> why and when it becomes to be NULL. It happens both in cellular and
> wlan network and seems an unknown kernel issue.
>
> To avoid crash the packet should skip to be segmented when fraglist is
> null.
>
> > >Failed reason is features doesn't
> > match
> > > gso_type.
> > > __udp_gso_segment_list
> > > skb_segment_list => packet is linear with skb->next =
> > NULL
> > > __udpv4_gso_segment_list_csum => use skb->next directly
> > and
> > > crash happens
> > >
> > > In rx-gro-list GRO-ed packet is set gso type as
> > > NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST in napi_gro_complete. In
> > gso
> > > flow the features should also set them to match with gso_type. Or
> > else it
> > > will always return false in skb_gso_ok. Then it can't discover the
> > > untrusted source packet and result crash in following function.
> >
> > What is the 'untrusted source' here? I read the above as the packet
> > aggregation happened in the GRO engine???
> >
> > Could you please give a complete description of the relevant
> > scenario?
> >
>
> According to the backtrace info, we infer it is a rx-frag_list GRO

It would be helpful to see an skb_dump. But if this happens rarely in
production, understood if that is not feasible.

The packet arrives on process_backlog, so still not sure how it is
produced.

> packet. Before sending into the UDP socket with no UDP_GRO sockopt, it
> seems enter "skb_condense" to trim it and loose his frag list. However
> it still keeps gso_type and gso_size. Then it continues to do
> skb_segment_list.
>
> First crash happens in skb_segment_list.
> This patch resolves the crash and lets the packet becomes a skb without
> skb->next:
> https://lore.kernel.org/all/Y9gt5EUizK1UImEP@debian/
> Then crash moves to __udp_gso_sement_list -> skb_segment_list(finish)
> -> __udpv4_gso_segment_list_csum, it uses skb->next without check then
> crash.
>
>
> What we want to do is to drop this abnormal packet.

I think we want to deliver this packet if possible.

Thanks for the added context. So this is assumed to be a GSO skb with
SKB_GSO_FRAGLIST that somewhere lots its fraglist? That is the bug
if true.

You are suggesting that this happens in the skb_condense in
__udp_enqueue_schedule_skb?

If generated by GRO then on a device that has NETIF_F_GRO_FRAGLIST set.
So one workaround (not fix) is to disable that.

> So we set features
> NETIF_F_GSO_UDP_L4 |NETIF_F_GSO_FRAGLIST to match fixes: f2696099c6c6
> condation then drop it.