Re: GPF in rt6_uncached_list_flush_dev

From: Eric W. Biederman
Date: Mon Oct 12 2015 - 12:03:28 EST


Eric Dumazet <eric.dumazet@xxxxxxxxx> writes:

> On Mon, 2015-10-12 at 11:34 +0200, Dmitry Vyukov wrote:
>> Hello,
>>
>> The following program causes episodic crashes:
>>
>> // autogenerated by syzkaller (http://github.com/google/syzkaller)
>> #include <sched.h>
>> #define CLONE_NEWNET 0x40000000
>> int main(void)
>> {
>> unshare(CLONE_NEWNET);
>> }
>>
>> On commit dd36d7393d6310b0c1adefb22fba79c3cf8a577c
>> (git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git)
>>
>> general protection fault: 0000 [#1] SMP KASAN
>> Modules linked in:
>> CPU: 0 PID: 1058 Comm: kworker/u8:1 Not tainted 4.3.0-rc2+ #12
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> Workqueue: netns cleanup_net
>> task: ffff880051c71a00 ti: ffff8800514f8000 task.ti: ffff8800514f8000
>> RIP: 0010:[<ffffffff82a6dad1>] [<ffffffff82a6dad1>] rt6_ifdown+0x481/0x740
>> RSP: 0018:ffff8800514ffaa0 EFLAGS: 00010246
>> RAX: dffffc0000000059 RBX: ffff88005107c580 RCX: 0000000000000002
>> RDX: 0000000000000000 RSI: 000000000000000f RDI: ffff880052a1f340
>> RBP: ffff8800514ffb78 R08: 0000000000000000 R09: ffff8800514ffb10
>> R10: ffff88002d5b7dc0 R11: ffff88002ec07600 R12: ffff880051c11140
>> R13: ffff88005144af40 R14: 0000000000000000 R15: dffffc0000000000
>> FS: 0000000000000000(0000) GS:ffff88002f000000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> CR2: 0000000000648056 CR3: 0000000003610000 CR4: 00000000000006f0
>> Stack:
>> 00000000000002c8 1ffff1000a29ff5e dffffc0000000059 000000022d5b61c0
>> ffff880052a1f340 ffff880051c11140 ffff880052a1f348 ffff88005107c6d8
>> ffff88005107c598 0000000000000000 0000000041b58ab3 ffffffff83471ca6
>> Call Trace:
>> [<ffffffff82a6f830>] fib6_net_exit+0x20/0x100 net/ipv6/ip6_fib.c:1847
>> [<ffffffff8271fd9e>] ops_exit_list.isra.6+0xae/0x150
>> net/core/net_namespace.c:134
>> [<ffffffff82722c5d>] cleanup_net+0x3cd/0x730
>> net/core/net_namespace.c:431 (discriminator 3)
>> [<ffffffff81142161>] process_one_work+0x6d1/0x1370 kernel/workqueue.c:2030
>> [<ffffffff81142ee3>] worker_thread+0xe3/0x1300 kernel/workqueue.c:2162
>> [<ffffffff811552e7>] kthread+0x1e7/0x260 kernel/kthread.c:209
>> [<ffffffff82e4283f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:475
>> Code: 89 95 50 ff ff ff e8 6f 41 9f fe 48 8b 95 50 ff ff ff 48 39 95
>> 70 ff ff ff 0f 84 d5 fe ff ff e8 56 41 9f fe 48 8b 85 38 ff ff ff <80>
>> 38 00 0f 85 9b 01 00 00 48 8b 85 70 ff ff ff 48 8b 90 c8 02
>> RIP [< inline >] __read_once_size include/linux/compiler.h:207
>> RIP [< inline >] in6_dev_get include/net/addrconf.h:281
>> RIP [< inline >] rt6_uncached_list_flush_dev net/ipv6/route.c:156
>> RIP [<ffffffff82a6dad1>] rt6_ifdown+0x481/0x740 net/ipv6/route.c:2621
>> RSP <ffff8800514ffaa0>
>> ---[ end trace 113e678e9b762d96 ]---
>> Kernel panic - not syncing: Fatal exception in interrupt
>> Kernel Offset: disabled
>> ---[ end Kernel panic - not syncing: Fatal exception in interrupt
>>
>>
>> The crash happens because loopback_dev is NULL in
>> rt6_uncached_list_flush_dev(). The crash happens only if there is an
>> uncached route when the interface in destroyed.
>>
>> I've tried to run the program with the following patch applied:
>>
>> diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
>> index dc7d970..fd7e88d 100644
>> --- a/drivers/net/loopback.c
>> +++ b/drivers/net/loopback.c
>> @@ -144,6 +144,8 @@ static int loopback_dev_init(struct net_device *dev)
>>
>> static void loopback_dev_free(struct net_device *dev)
>> {
>> + pr_err("loopback_dev_free %p = %p",
>> &dev_net(dev)->loopback_dev, dev_net(dev)->loopback_dev);
>> + WARN_ON(1);
>> dev_net(dev)->loopback_dev = NULL;
>> free_percpu(dev->lstats);
>> free_netdev(dev);
>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> index f204089..fd558a4 100644
>> --- a/net/ipv6/route.c
>> +++ b/net/ipv6/route.c
>> @@ -142,6 +142,8 @@ static void rt6_uncached_list_flush_dev(struct net
>> *net, struct net_device *dev)
>> struct net_device *loopback_dev = net->loopback_dev;
>> int cpu;
>>
>> + pr_err("rt6_uncached_list_flush_dev %p = %p",
>> &net->loopback_dev, net->loopback_dev);
>> + WARN_ON(1);
>> for_each_possible_cpu(cpu) {
>> struct uncached_list *ul = per_cpu_ptr(&rt6_uncached_list, cpu);
>> struct rt6_info *rt;
>>
>>
>> And it shows that the loopback device is destroyed before
>> rt6_uncached_list_flush_dev is executed, while
>> rt6_uncached_list_flush_dev seems to assume that loopback_dev is alive
>> when it is called:
>>
>> [ 197.812174] loopback_dev_free ffff88003d288150 = ffff88003e1d67c0
>> [ 197.812890] ------------[ cut here ]------------
>> [ 197.813389] WARNING: CPU: 2 PID: 1044 at drivers/net/loopback.c:148
>> loopback_dev_free+0x3c/0x70()
>> [ 197.814186] Modules linked in:
>> [ 197.814478] CPU: 2 PID: 1044 Comm: kworker/u8:1 Tainted: G W
>> 4.3.0-rc3+ #45
>> [ 197.815186] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> BIOS Bochs 01/01/2011
>> [ 197.815886] Workqueue: netns cleanup_net
>> [ 197.816256] ffffffff81c27c67 ffff88003d923c50 ffffffff812fe8d6
>> 0000000000000000
>> [ 197.816949] ffff88003d923c88 ffffffff81051ff1 ffff88003e1d67c0
>> ffff88003e1d6bd0
>> [ 197.817662] 00000000fffe70d4 00000000fffe70d4 00000000000003e8
>> ffff88003d923c98
>> [ 197.818367] Call Trace:
>> [ 197.818589] [<ffffffff812fe8d6>] dump_stack+0x44/0x5e
>> [ 197.819048] [<ffffffff81051ff1>] warn_slowpath_common+0x81/0xc0
>> [ 197.819573] [<ffffffff810520e5>] warn_slowpath_null+0x15/0x20
>> [ 197.820088] [<ffffffff8151e36c>] loopback_dev_free+0x3c/0x70
>> [ 197.820588] [<ffffffff81698c71>] netdev_run_todo+0x211/0x300
>> [ 197.821096] [<ffffffff816915b2>] ? rollback_registered_many+0x222/0x2b0
>> [ 197.823461] [<ffffffff816a2dc9>] rtnl_unlock+0x9/0x10
>> [ 197.824109] [<ffffffff81692683>] default_device_exit_batch+0x133/0x150
>> [ 197.824924] [<ffffffff81087f10>] ? __wake_up_sync+0x10/0x10
>> [ 197.825608] [<ffffffff8168b97d>] ops_exit_list.isra.6+0x4d/0x60
>> [ 197.826335] [<ffffffff8168c87c>] cleanup_net+0x17c/0x230
>> [ 197.826963] [<ffffffff81067c7e>] process_one_work+0x13e/0x3c0
>> [ 197.827645] [<ffffffff81068015>] worker_thread+0x115/0x450
>> [ 197.828305] [<ffffffff81856241>] ? __schedule+0x311/0x870
>> [ 197.828935] [<ffffffff81067f00>] ? process_one_work+0x3c0/0x3c0
>> [ 197.829642] [<ffffffff8106d044>] kthread+0xc4/0xe0
>> [ 197.830220] [<ffffffff8106cf80>] ? kthread_park+0x50/0x50
>> [ 197.830853] [<ffffffff81859e6f>] ret_from_fork+0x3f/0x70
>> [ 197.831486] [<ffffffff8106cf80>] ? kthread_park+0x50/0x50
>> [ 197.832129] ---[ end trace 54eee6f54dedacca ]---
>>
>> [ 197.835015] IPv6: rt6_uncached_list_flush_dev ffff88003d288150 =
>> (null)
>> [ 197.835641] ------------[ cut here ]------------
>> [ 197.836083] WARNING: CPU: 2 PID: 1044 at net/ipv6/route.c:146
>> rt6_ifdown+0xc7/0x220()
>> [ 197.836738] Modules linked in:
>> [ 197.837022] CPU: 2 PID: 1044 Comm: kworker/u8:1 Tainted: G W
>> 4.3.0-rc3+ #45
>> [ 197.837714] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> BIOS Bochs 01/01/2011
>> [ 197.838395] Workqueue: netns cleanup_net
>> [ 197.838738] ffffffff81c3ac07 ffff88003d923cc8 ffffffff812fe8d6
>> 0000000000000000
>> [ 197.839410] ffff88003d923d00 ffffffff81051ff1 0000000000000000
>> ffff88003d288000
>> [ 197.840079] ffffffff82119b98 0000000000000000 0000000000000000
>> ffff88003d923d10
>> [ 197.840740] Call Trace:
>> [ 197.840952] [<ffffffff812fe8d6>] dump_stack+0x44/0x5e
>> [ 197.841391] [<ffffffff81051ff1>] warn_slowpath_common+0x81/0xc0
>> [ 197.841848] [<ffffffff810520e5>] warn_slowpath_null+0x15/0x20
>> [ 197.842297] [<ffffffff81761407>] rt6_ifdown+0xc7/0x220
>> [ 197.842701] [<ffffffff8177d020>] ? xfrm6_net_exit+0x30/0x40
>> [ 197.843140] [<ffffffff81761c0f>] fib6_net_exit+0xf/0x60
>> [ 197.843545] [<ffffffff8168b963>] ops_exit_list.isra.6+0x33/0x60
>> [ 197.843999] [<ffffffff8168c87c>] cleanup_net+0x17c/0x230
>> [ 197.844420] [<ffffffff81067c7e>] process_one_work+0x13e/0x3c0
>> [ 197.844867] [<ffffffff81068015>] worker_thread+0x115/0x450
>> [ 197.845324] [<ffffffff81856241>] ? __schedule+0x311/0x870
>> [ 197.845761] [<ffffffff81067f00>] ? process_one_work+0x3c0/0x3c0
>> [ 197.846288] [<ffffffff8106d044>] kthread+0xc4/0xe0
>> [ 197.846698] [<ffffffff8106cf80>] ? kthread_park+0x50/0x50
>> [ 197.847161] [<ffffffff81859e6f>] ret_from_fork+0x3f/0x70
>> [ 197.847612] [<ffffffff8106cf80>] ? kthread_park+0x50/0x50
>> [ 197.848074] ---[ end trace 54eee6f54dedaccb ]---
>>
>> I use plain defconfig/kvmconfig.
>>
>> Found with syzkaller fuzzer.
>> --
>
> CC Eric W. Biederman <ebiederm@xxxxxxxxxxxx>, who is the expert in this
> area.
>
> Thanks.
>
> Bug was added in 8d0b94afdca84
> ("ipv6: Keep track of DST_NOCACHE routes in case of iface
> down/unregister")
>
> CC Martin KaFai Lau <kafai@xxxxxx>

So I don't quite know what it was intended that rt6_uncached_list_flush
was intended to be doing but that code is not correct by a country mile.

What the code attempts to do is to flush every uncached entry when any
network namespace exits. Which makes no sense whatsoever.

Further we are past the point of network devices even existing in a
network namespace so it does not even make sense to attempt to do
anything with network devices.

So given the fact that there is nothing for rt6_unchaced_list_flush to
do in this case and there is no sensible thing for rt6_uncached_list to
do when when dev == NULL. I recommend removing the dev == NULL support
and just not calling rt6_uncached_list_flish when dev == NULL.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/