Re: net/ipv4: use-after-free in add_grec
From: Eric Dumazet
Date: Wed May 31 2017 - 19:55:59 EST
On Wed, May 31, 2017 at 4:49 PM, Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote:
> On Wed, May 31, 2017 at 9:12 AM, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:
>> On Wed, 2017-05-31 at 11:46 +0200, Andrey Konovalov wrote:
>>> Hi,
>>>
>>> I've got the following error report while fuzzing the kernel with syzkaller.
>>>
>>> On commit 5ed02dbb497422bf225783f46e6eadd237d23d6b (4.12-rc3).
>>>
>>> Unfortunately it's not reproducible.
>>>
>>> ==================================================================
>>> BUG: KASAN: use-after-free in add_grec+0x101e/0x1090 net/ipv4/igmp.c:473
>>> Read of size 8 at addr ffff88003053c1a0 by task swapper/0/0
>>>
>>> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.12.0-rc3+ #370
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>> Call Trace:
>>> <IRQ>
>>> __dump_stack lib/dump_stack.c:16 [inline]
>>> dump_stack+0x292/0x395 lib/dump_stack.c:52
>>> print_address_description+0x73/0x280 mm/kasan/report.c:252
>>> kasan_report_error mm/kasan/report.c:351 [inline]
>>> kasan_report+0x22b/0x340 mm/kasan/report.c:408
>>> __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:429
>>> add_grec+0x101e/0x1090 net/ipv4/igmp.c:473
>>> igmpv3_send_cr net/ipv4/igmp.c:663 [inline]
>>> igmp_ifc_timer_expire+0x46d/0xa80 net/ipv4/igmp.c:768
>>> IPVS: length: 51 != 8
>>> call_timer_fn+0x23f/0x800 kernel/time/timer.c:1268
>>> expire_timers kernel/time/timer.c:1307 [inline]
>>> __run_timers+0x94e/0xcd0 kernel/time/timer.c:1601
>>> run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614
>>> __do_softirq+0x2fb/0xb99 kernel/softirq.c:284
>>> invoke_softirq kernel/softirq.c:364 [inline]
>>> irq_exit+0x19e/0x1d0 kernel/softirq.c:405
>>> exiting_irq arch/x86/include/asm/apic.h:652 [inline]
>>> smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:966
>>> apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:481
>>> RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:53
>>> RSP: 0018:ffffffff85a079a8 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff10
>>> RAX: dffffc0000000020 RBX: 1ffffffff0b40f38 RCX: 0000000000000000
>>> RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff85a2a9e4
>>> RBP: ffffffff85a079a8 R08: 0000000000000000 R09: 0000000000000000
>>> R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000
>>> R13: ffffffff85a07a60 R14: ffffffff86171338 R15: 1ffffffff0b40f5b
>>> </IRQ>
>>> arch_safe_halt arch/x86/include/asm/paravirt.h:98 [inline]
>>> default_idle+0x8f/0x440 arch/x86/kernel/process.c:341
>>> arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:332
>>> default_idle_call+0x36/0x60 kernel/sched/idle.c:98
>>> cpuidle_idle_call kernel/sched/idle.c:156 [inline]
>>> do_idle+0x348/0x420 kernel/sched/idle.c:245
>>> cpu_startup_entry+0x18/0x20 kernel/sched/idle.c:350
>>> rest_init+0x18d/0x1a0 init/main.c:415
>>> start_kernel+0x747/0x779 init/main.c:679
>>> x86_64_start_reservations+0x2a/0x2c arch/x86/kernel/head64.c:196
>>> x86_64_start_kernel+0x132/0x141 arch/x86/kernel/head64.c:177
>>> secondary_startup_64+0x9f/0x9f arch/x86/kernel/head_64.S:304
>>>
>>> Allocated by task 30543:
>>> save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>>> save_stack+0x43/0xd0 mm/kasan/kasan.c:513
>>> set_track mm/kasan/kasan.c:525 [inline]
>>> kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:617
>>> kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2745
>>> kmalloc include/linux/slab.h:492 [inline]
>>> kzalloc include/linux/slab.h:665 [inline]
>>> ip_mc_add1_src net/ipv4/igmp.c:1909 [inline]
>>> ip_mc_add_src+0x6cd/0x1020 net/ipv4/igmp.c:2033
>>> ip_mc_msfilter+0x5e5/0xcf0 net/ipv4/igmp.c:2403
>>> do_ip_setsockopt.isra.12+0x2d47/0x38c0 net/ipv4/ip_sockglue.c:959
>>> ip_setsockopt+0x3a/0xb0 net/ipv4/ip_sockglue.c:1256
>>> tcp_setsockopt+0x82/0xd0 net/ipv4/tcp.c:2740
>>> sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2844
>>> SYSC_setsockopt net/socket.c:1798 [inline]
>>> SyS_setsockopt+0x270/0x3a0 net/socket.c:1777
>>> entry_SYSCALL_64_fastpath+0x1f/0xbe
>>>
>>> Freed by task 30543:
>>> save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>>> save_stack+0x43/0xd0 mm/kasan/kasan.c:513
>>> set_track mm/kasan/kasan.c:525 [inline]
>>> kasan_slab_free+0x72/0xc0 mm/kasan/kasan.c:590
>>> slab_free_hook mm/slub.c:1357 [inline]
>>> slab_free_freelist_hook mm/slub.c:1379 [inline]
>>> slab_free mm/slub.c:2961 [inline]
>>> kfree+0xe8/0x2b0 mm/slub.c:3882
>>> ip_mc_clear_src+0x69/0x1c0 net/ipv4/igmp.c:2078
>>> ip_mc_dec_group+0x19a/0x470 net/ipv4/igmp.c:1618
>>> ip_mc_drop_socket+0x145/0x230 net/ipv4/igmp.c:2609
>>> inet_release+0x4e/0x1c0 net/ipv4/af_inet.c:411
>>> sock_release+0x8d/0x1e0 net/socket.c:597
>>> sock_close+0x16/0x20 net/socket.c:1072
>>> __fput+0x332/0x7f0 fs/file_table.c:209
>>> ____fput+0x15/0x20 fs/file_table.c:245
>>> task_work_run+0x19b/0x270 kernel/task_work.c:116
>>> exit_task_work include/linux/task_work.h:21 [inline]
>>> do_exit+0x18a3/0x2820 kernel/exit.c:878
>>> do_group_exit+0x149/0x420 kernel/exit.c:982
>>> get_signal+0x76d/0x1790 kernel/signal.c:2318
>>> do_signal+0xd2/0x2130 arch/x86/kernel/signal.c:808
>>> exit_to_usermode_loop+0x17a/0x210 arch/x86/entry/common.c:157
>>> prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
>>> syscall_return_slowpath+0x3ba/0x410 arch/x86/entry/common.c:263
>>> entry_SYSCALL_64_fastpath+0xbc/0xbe
>>>
>>> The buggy address belongs to the object at ffff88003053c1a0
>>> which belongs to the cache kmalloc-64 of size 64
>>> The buggy address is located 0 bytes inside of
>>> 64-byte region [ffff88003053c1a0, ffff88003053c1e0)
>>> The buggy address belongs to the page:
>>> page:ffffea0000c14f00 count:1 mapcount:0 mapping: (null)
>>> index:0x0 compound_mapcount: 0
>>> flags: 0x100000000008100(slab|head)
>>> raw: 0100000000008100 0000000000000000 0000000000000000 0000000100140014
>>> raw: ffffea0000c2f520 ffffea0000e20aa0 ffff88003e80f740 0000000000000000
>>> page dumped because: kasan: bad access detected
>>>
>>> Memory state around the buggy address:
>>> ffff88003053c080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>> ffff88003053c100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>> >ffff88003053c180: fc fc fc fc fb fb fb fb fb fb fb fb fc fc fc fc
>>> ^
>>> ffff88003053c200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>> ffff88003053c280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>> ==================================================================
>>
>> I have the feeling that ip_mc_clear_src() is called too soon.
>>
>> We should call it once all users have released their reference.
>
> In this case, ip_mc_clear_src() is called when the ->users hits
> zero, therefore I think we possibly miss the refcnt on ->users.
>
> diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
> index 44fd86d..c2f3347 100644
> --- a/net/ipv4/igmp.c
> +++ b/net/ipv4/igmp.c
> @@ -1886,6 +1886,7 @@ static int ip_mc_del_src(struct in_device
> *in_dev, __be32 *pmca, int sfmode,
> igmp_ifc_event(pmc->interface);
> #endif
> }
> + pmc->users--;
> out_unlock:
> spin_unlock_bh(&pmc->lock);
> return err;
> @@ -2025,6 +2026,7 @@ static int ip_mc_add_src(struct in_device
> *in_dev, __be32 *pmca, int sfmode,
> #ifdef CONFIG_IP_MULTICAST
> sf_markstate(pmc);
> #endif
> + pmc->users++;
> isexclude = pmc->sfmode == MCAST_EXCLUDE;
> if (!delta)
> pmc->sfcount[sfmode]++;
The issue here is the timer firing while ip_mc_clear_src() has been
already called.
My patch should fix the problem.
Or another one using del_timer_sync() instead of del_timer() in
igmp_stop_timer(), but such a change would be more invasive,
since the del_timer_sync() would need to happen while im->lock
spinlock is not held.