Re: v4.14-rc2/arm64 kernel BUG at net/core/skbuff.c:2626

From: Eric Dumazet
Date: Mon Oct 02 2017 - 10:42:25 EST


On Mon, Oct 2, 2017 at 7:21 AM, Mark Rutland <mark.rutland@xxxxxxx> wrote:
> Hi Eric,
>
> On Mon, Oct 02, 2017 at 06:36:32AM -0700, Eric Dumazet wrote:
>> On Mon, Oct 2, 2017 at 3:49 AM, Mark Rutland <mark.rutland@xxxxxxx> wrote:
>> > I hit the below splat at net/core/skbuff.c:2626 while fuzzing v4.14-rc2
>> > on arm64 with Syzkaller. This is the BUG_ON(len) at the end of
>> > skb_copy_and_csum_bits().
>
>> > kernel BUG at net/core/skbuff.c:2626!
>
>> > [<ffff200009e03214>] skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626
>> > [<ffff20000a01d244>] icmp_glue_bits+0xa4/0x2a0 net/ipv4/icmp.c:357
>> > [<ffff200009f3f0d4>] __ip_append_data+0x10e4/0x20a8 net/ipv4/ip_output.c:1018
>> > [<ffff200009f41a88>] ip_append_data.part.3+0xe8/0x1a0 net/ipv4/ip_output.c:1170
>> > [<ffff200009f46e74>] ip_append_data+0xa4/0xb0 net/ipv4/ip_output.c:1173
>> > [<ffff20000a01ccc8>] icmp_push_reply+0x1b8/0x690 net/ipv4/icmp.c:375
>> > [<ffff20000a0211b0>] icmp_send+0x1070/0x1890 net/ipv4/icmp.c:741
>> > [<ffff200009f41d48>] ip_fragment.constprop.4+0x208/0x340 net/ipv4/ip_output.c:552
>> > [<ffff200009f42228>] ip_finish_output+0x3a8/0xab0 net/ipv4/ip_output.c:315
>> > [<ffff200009f468c4>] NF_HOOK_COND include/linux/netfilter.h:238 [inline]
>> > [<ffff200009f468c4>] ip_output+0x284/0x790 net/ipv4/ip_output.c:405
>> > [<ffff200009f43204>] dst_output include/net/dst.h:458 [inline]
>> > [<ffff200009f43204>] ip_local_out+0x9c/0x1b8 net/ipv4/ip_output.c:124
>> > [<ffff200009f445e8>] ip_queue_xmit+0x850/0x18e0 net/ipv4/ip_output.c:504
>> > [<ffff200009fb091c>] tcp_transmit_skb+0x107c/0x3338 net/ipv4/tcp_output.c:1123
>> > [<ffff200009fbbcc4>] __tcp_retransmit_skb+0x614/0x1d18 net/ipv4/tcp_output.c:2847
>> > [<ffff200009fbd840>] tcp_send_loss_probe+0x478/0x7d0 net/ipv4/tcp_output.c:2457
>> > [<ffff200009fc707c>] tcp_write_timer_handler+0x50c/0x7e8 net/ipv4/tcp_timer.c:557
>> > [<ffff200009fc73d0>] tcp_write_timer+0x78/0x170 net/ipv4/tcp_timer.c:579
>> > [<ffff2000082f8980>] call_timer_fn+0x1b8/0x430 kernel/time/timer.c:1281
>> > [<ffff2000082f8dcc>] expire_timers+0x1d4/0x320 kernel/time/timer.c:1320
>> > [<ffff2000082f912c>] __run_timers kernel/time/timer.c:1620 [inline]
>> > [<ffff2000082f912c>] run_timer_softirq+0x214/0x5f0 kernel/time/timer.c:1646
>> > [<ffff2000080826c0>] __do_softirq+0x350/0xc0c kernel/softirq.c:284
>> > [<ffff200008170af4>] do_softirq_own_stack include/linux/interrupt.h:498 [inline]
>> > [<ffff200008170af4>] invoke_softirq kernel/softirq.c:371 [inline]
>> > [<ffff200008170af4>] irq_exit+0x1dc/0x2f8 kernel/softirq.c:405
>> > [<ffff2000082a95bc>] __handle_domain_irq+0xdc/0x230 kernel/irq/irqdesc.c:647
>> > [<ffff2000080820ac>] handle_domain_irq include/linux/irqdesc.h:175 [inline]
>> > [<ffff2000080820ac>] gic_handle_irq+0x6c/0xe0 drivers/irqchip/irq-gic.c:367
>
>> This is most likely a bug caused by syzkaller setting a ridiculous MTU
>> on loopback device, below minimum size of ipv4 MTU.
>
>> I tried to track it in August [1], but it seems hard to find all the
>> issues with this.
>>
>> commit c780a049f9bf442314335372c9abc4548bfe3e44
>> Author: Eric Dumazet <edumazet@xxxxxxxxxx>
>> Date: Wed Aug 16 11:09:12 2017 -0700
>>
>> ipv4: better IP_MAX_MTU enforcement
>>
>> While working on yet another syzkaller report, I found
>> that our IP_MAX_MTU enforcements were not properly done.
>>
>> gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and
>> final result can be bigger than IP_MAX_MTU :/
>>
>> This is a problem because device mtu can be changed on other cpus or
>> threads.
>>
>> While this patch does not fix the issue I am working on, it is
>> probably worth addressing it.
>
> Just to check I've understood correctly, are you suggesting that the
> IPv4 code should also check the dev->mtu against a IP_MIN_MTU (which
> doesn't seem to exist today)?

We have plenty of places this is checked.

For example, trying to set MTU < 68 usually removes IPv4 addresses and routes.

Problem is : these checks are not fool proof yet.

( Only the admin was supposed to play these games )

>
> Otherwise, I do spot another potential issue. The writer side (e.g. most
> net_device::ndo_change_mtu implementations and the __dev_set_mtu()
> fallback) doesn't use WRITE_ONCE().

It does not matter how many strange values can be observed by the reader :
We must be fool proof anyway from reader point of view, so the
WRITE_ONCE() is not strictly needed.


>
> IIUC, that means that the write could be torn across multiple accesses,
> and we could see dev->mtu < dev->min_mtu on the read side, even if we
> use READ_ONCE(), and sanity check the mtu value before calling
> __dev_set_mtu().