Re: v4.14-rc2/arm64 kernel BUG at net/core/skbuff.c:2626
From: Dmitry Vyukov
Date: Tue Oct 03 2017 - 11:19:57 EST
On Mon, Oct 2, 2017 at 4:42 PM, 'Eric Dumazet' via syzkaller
> On Mon, Oct 2, 2017 at 7:21 AM, Mark Rutland <mark.rutland@xxxxxxx> wrote:
>> Hi Eric,
>> On Mon, Oct 02, 2017 at 06:36:32AM -0700, Eric Dumazet wrote:
>>> On Mon, Oct 2, 2017 at 3:49 AM, Mark Rutland <mark.rutland@xxxxxxx> wrote:
>>> > I hit the below splat at net/core/skbuff.c:2626 while fuzzing v4.14-rc2
>>> > on arm64 with Syzkaller. This is the BUG_ON(len) at the end of
>>> > skb_copy_and_csum_bits().
>>> > kernel BUG at net/core/skbuff.c:2626!
>>> > [<ffff200009e03214>] skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626
>>> > [<ffff20000a01d244>] icmp_glue_bits+0xa4/0x2a0 net/ipv4/icmp.c:357
>>> > [<ffff200009f3f0d4>] __ip_append_data+0x10e4/0x20a8 net/ipv4/ip_output.c:1018
>>> > [<ffff200009f41a88>] ip_append_data.part.3+0xe8/0x1a0 net/ipv4/ip_output.c:1170
>>> > [<ffff200009f46e74>] ip_append_data+0xa4/0xb0 net/ipv4/ip_output.c:1173
>>> > [<ffff20000a01ccc8>] icmp_push_reply+0x1b8/0x690 net/ipv4/icmp.c:375
>>> > [<ffff20000a0211b0>] icmp_send+0x1070/0x1890 net/ipv4/icmp.c:741
>>> > [<ffff200009f41d48>] ip_fragment.constprop.4+0x208/0x340 net/ipv4/ip_output.c:552
>>> > [<ffff200009f42228>] ip_finish_output+0x3a8/0xab0 net/ipv4/ip_output.c:315
>>> > [<ffff200009f468c4>] NF_HOOK_COND include/linux/netfilter.h:238 [inline]
>>> > [<ffff200009f468c4>] ip_output+0x284/0x790 net/ipv4/ip_output.c:405
>>> > [<ffff200009f43204>] dst_output include/net/dst.h:458 [inline]
>>> > [<ffff200009f43204>] ip_local_out+0x9c/0x1b8 net/ipv4/ip_output.c:124
>>> > [<ffff200009f445e8>] ip_queue_xmit+0x850/0x18e0 net/ipv4/ip_output.c:504
>>> > [<ffff200009fb091c>] tcp_transmit_skb+0x107c/0x3338 net/ipv4/tcp_output.c:1123
>>> > [<ffff200009fbbcc4>] __tcp_retransmit_skb+0x614/0x1d18 net/ipv4/tcp_output.c:2847
>>> > [<ffff200009fbd840>] tcp_send_loss_probe+0x478/0x7d0 net/ipv4/tcp_output.c:2457
>>> > [<ffff200009fc707c>] tcp_write_timer_handler+0x50c/0x7e8 net/ipv4/tcp_timer.c:557
>>> > [<ffff200009fc73d0>] tcp_write_timer+0x78/0x170 net/ipv4/tcp_timer.c:579
>>> > [<ffff2000082f8980>] call_timer_fn+0x1b8/0x430 kernel/time/timer.c:1281
>>> > [<ffff2000082f8dcc>] expire_timers+0x1d4/0x320 kernel/time/timer.c:1320
>>> > [<ffff2000082f912c>] __run_timers kernel/time/timer.c:1620 [inline]
>>> > [<ffff2000082f912c>] run_timer_softirq+0x214/0x5f0 kernel/time/timer.c:1646
>>> > [<ffff2000080826c0>] __do_softirq+0x350/0xc0c kernel/softirq.c:284
>>> > [<ffff200008170af4>] do_softirq_own_stack include/linux/interrupt.h:498 [inline]
>>> > [<ffff200008170af4>] invoke_softirq kernel/softirq.c:371 [inline]
>>> > [<ffff200008170af4>] irq_exit+0x1dc/0x2f8 kernel/softirq.c:405
>>> > [<ffff2000082a95bc>] __handle_domain_irq+0xdc/0x230 kernel/irq/irqdesc.c:647
>>> > [<ffff2000080820ac>] handle_domain_irq include/linux/irqdesc.h:175 [inline]
>>> > [<ffff2000080820ac>] gic_handle_irq+0x6c/0xe0 drivers/irqchip/irq-gic.c:367
>>> This is most likely a bug caused by syzkaller setting a ridiculous MTU
>>> on loopback device, below minimum size of ipv4 MTU.
>>> I tried to track it in August , but it seems hard to find all the
>>> issues with this.
>>> commit c780a049f9bf442314335372c9abc4548bfe3e44
>>> Author: Eric Dumazet <edumazet@xxxxxxxxxx>
>>> Date: Wed Aug 16 11:09:12 2017 -0700
>>> ipv4: better IP_MAX_MTU enforcement
>>> While working on yet another syzkaller report, I found
>>> that our IP_MAX_MTU enforcements were not properly done.
>>> gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and
>>> final result can be bigger than IP_MAX_MTU :/
>>> This is a problem because device mtu can be changed on other cpus or
>>> While this patch does not fix the issue I am working on, it is
>>> probably worth addressing it.
>> Just to check I've understood correctly, are you suggesting that the
>> IPv4 code should also check the dev->mtu against a IP_MIN_MTU (which
>> doesn't seem to exist today)?
> We have plenty of places this is checked.
> For example, trying to set MTU < 68 usually removes IPv4 addresses and routes.
> Problem is : these checks are not fool proof yet.
> ( Only the admin was supposed to play these games )
>> Otherwise, I do spot another potential issue. The writer side (e.g. most
>> net_device::ndo_change_mtu implementations and the __dev_set_mtu()
>> fallback) doesn't use WRITE_ONCE().
> It does not matter how many strange values can be observed by the reader :
> We must be fool proof anyway from reader point of view, so the
> WRITE_ONCE() is not strictly needed.
Note if writer stores some temporal garbage there (which C language
perfectly allows), it does not matter what we do on reader side --
reader won't get correct data anyway. Say mtu changes from 1000 to
2000, but writer temporary stores 1 there, reader can observe 1 while
it must not. Synchronization is always a game of two.