Re: [PATCH] net: check for dev pointer being NULL in dev_hard_header() to avoid GPF

From: George Kennedy
Date: Mon Dec 12 2022 - 08:34:57 EST


Hi Eric,

On 12/5/2022 10:21 PM, Eric Dumazet wrote:
On Tue, Dec 6, 2022 at 2:11 AM George Kennedy <george.kennedy@xxxxxxxxxx> wrote:
Hi Eric,

More info...

On 12/1/2022 11:11 PM, Eric Dumazet wrote:
On Thu, Dec 1, 2022 at 9:44 PM George Kennedy <george.kennedy@xxxxxxxxxx> wrote:

On 12/1/2022 2:25 PM, Eric Dumazet wrote:
On Thu, Dec 1, 2022 at 2:16 PM Pavan Chebbi <pavan.chebbi@xxxxxxxxxxxx> wrote:
On Wed, Nov 30, 2022 at 7:43 PM George Kennedy
<george.kennedy@xxxxxxxxxx> wrote:
The dev pointer can be NULL in dev_hard_header(). Add check for dev being
NULL in dev_hard_header() to avoid GPF.

general protection fault, probably for non-canonical address
0xdffffc0000000046: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000230-0x0000000000000237]
CPU: 1 PID: 45 Comm: kworker/1:1 Not tainted 6.1.0-rc7+ #2
Hardware name: Red Hat KVM, BIOS 1.15.0-2.module+el8.6.0+20659+3dcf7c70
Workqueue: mld mld_ifc_work
RIP: 0010:macvlan_hard_header (./include/linux/netdevice.h:3057
(discriminator 4) drivers/net/macvlan.c:594 (discriminator 4))
RSP: 0018:ffff888103d377d0 EFLAGS: 00010212
RAX: dffffc0000000000 RBX: ffff88801cf1a000 RCX: 0000000000000000
RDX: 0000000000000046 RSI: 0000000000000000 RDI: 0000000000000230
RBP: ffff88801e8ef328 R08: 0000000000000000 R09: 0000000000000060
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88801f0497c0
R13: 0000000000000000 R14: ffff888045187c98 R15: 0000000000000060
FS: 0000000000000000(0000) GS:ffff888106c80000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fbf3f1c1840 CR3: 0000000014e36000 CR4: 00000000000006e0
Call Trace:
<TASK>
neigh_connected_output (./include/linux/netdevice.h:3060
net/core/neighbour.c:1595)
ip6_finish_output2 (./include/net/neighbour.h:546
net/ipv6/ip6_output.c:134)
ip6_finish_output (net/ipv6/ip6_output.c:195 net/ipv6/ip6_output.c:206)
ip6_output (./include/linux/netfilter.h:291 net/ipv6/ip6_output.c:227)
NF_HOOK.constprop.0 (./include/net/dst.h:445
./include/linux/netfilter.h:302)
mld_sendpack (net/ipv6/mcast.c:1824)
mld_send_cr (net/ipv6/mcast.c:2122)
mld_ifc_work (net/ipv6/mcast.c:2655)
process_one_work (kernel/workqueue.c:2294)
worker_thread (./include/linux/list.h:292 kernel/workqueue.c:2437)
kthread (kernel/kthread.c:376)
ret_from_fork (arch/x86/entry/entry_64.S:312)
</TASK>
Modules linked in:
Dumping ftrace buffer:
(ftrace buffer empty)
---[ end trace 0000000000000000 ]---

Fixes: 0c4e85813d0a ("[NET]: Wrap netdevice hardware header creation.")
Reported-by: syzkaller <syzkaller@xxxxxxxxxxxxxxxx>
Signed-off-by: George Kennedy <george.kennedy@xxxxxxxxxx>
---
include/linux/netdevice.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index eddf8ee270e7..9b25a6301fa5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3054,7 +3054,7 @@ static inline int dev_hard_header(struct sk_buff *skb, struct net_device *dev,
const void *daddr, const void *saddr,
unsigned int len)
{
- if (!dev->header_ops || !dev->header_ops->create)
+ if (!dev || !dev->header_ops || !dev->header_ops->create)
Do you have a repro ?
See syzkaller repros attached.

This patch will not prevent a crash later I think.
The repro ran overnight without failure with the patch applied.
Yes, but the patch is hiding a potential bug that might show up with
other 'repros'
The repro fails when these devices are configured (seem like small mtu):

20: vxcan0@vxcan1: <NOARP,UP,LOWER_UP> mtu 72 qdisc noqueue state UP group default qlen 1000
link/can
inet 172.20.20.38/24 scope global vxcan0
valid_lft forever preferred_lft forever
21: vxcan1@vxcan0: <NOARP,UP,LOWER_UP> mtu 72 qdisc noqueue state UP group default qlen 1000
link/can
inet 172.20.20.39/24 scope global vxcan1
valid_lft forever preferred_lft forever


# diff ../config.fail .config
3325c3325
< CONFIG_CAN_VXCAN=y
---
# CONFIG_CAN_VXCAN is not set
Thanks,
George
Small MTU has caused numerous issues in the past.

I am pretty sure we miss some READ_ONCE(dev->mtu) and other safety checks.

I have not been able to find the root-cause of the "vxcan" related GPF yet. What I do know is that for the GPF to occur:
1) CONFIG_CAN_VXCAN=y must be set
2) if CONFIG_CAN_VXCAN=y is set, the GPF will not occur if "vxcan" is commented out of the C reproducer

C reproducer with "vxcan" commented out (GPF will not occur):

# diff -C 3 repro_macvlan1.c repro_macvlan1_no_vsxcan.c
*** repro_macvlan1.c    2022-12-06 01:03:47.557094544 +0000
--- repro_macvlan1_no_vsxcan.c    2022-12-12 13:15:05.293719169 +0000
***************
*** 884,890 ****
        {"vcan", "vcan0"},           {"bond", "bond0"},
        {"team", "team0"},           {"dummy", "dummy0"},
        {"nlmon", "nlmon0"},         {"caif", "caif0"},
!       {"batadv", "batadv0"},       {"vxcan", "vxcan1"},
        {"netdevsim", netdevsim},    {"veth", 0},
        {"xfrm", "xfrm0"},           {"wireguard", "wg0"},
        {"wireguard", "wg1"},        {"wireguard", "wg2"},
--- 884,893 ----
        {"vcan", "vcan0"},           {"bond", "bond0"},
        {"team", "team0"},           {"dummy", "dummy0"},
        {"nlmon", "nlmon0"},         {"caif", "caif0"},
!       {"batadv", "batadv0"},
! #ifdef VXCAN
!       {"vxcan", "vxcan1"},
! #endif
        {"netdevsim", netdevsim},    {"veth", 0},
        {"xfrm", "xfrm0"},           {"wireguard", "wg0"},
        {"wireguard", "wg1"},        {"wireguard", "wg2"},
***************
*** 923,930 ****
--- 926,935 ----
        {"hsr0", 0},
        {"dummy0", ETH_ALEN},
        {"nlmon0", 0},
+ #ifdef VXCAN
        {"vxcan0", 0, true},
        {"vxcan1", 0, true},
+ #endif
        {"caif0", ETH_ALEN},
        {"batadv0", ETH_ALEN},
        {netdevsim, ETH_ALEN},

I think for now until the vxcan related GPF root-cause is found, the "dev NULL check" patch should go in. The "dev NULL check" patch is on same line as 2 other NULL checks.

Could you try the original C reproducer with kernel with CONFIG_CAN_VXCAN=y set to see if you have any insights as to the GPF root-cause?

Thank you,
George