Re: KASAN: stack-out-of-bounds Read in __schedule

From: Daniel Borkmann
Date: Thu Aug 30 2018 - 05:52:25 EST

On 08/30/2018 06:11 AM, Dmitry Vyukov wrote:
> On Wed, Aug 29, 2018 at 7:03 AM, 'Alexander Potapenko' via
> syzkaller-bugs <syzkaller-bugs@xxxxxxxxxxxxxxxx> wrote:
>> On Wed, Aug 29, 2018 at 3:46 PM Jan Kara <jack@xxxxxxx> wrote:
>>> On Tue 28-08-18 08:30:02, syzbot wrote:
>>>> Hello,
>>>> syzbot found the following crash on:
>>>> HEAD commit: 5b394b2ddf03 Linux 4.19-rc1
>>>> git tree: upstream
>>>> console output:
>>>> kernel config:
>>>> dashboard link:
>>>> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
>>>> syz repro:
>>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>>> Reported-by: syzbot+45a34334c61a8ecf661d@xxxxxxxxxxxxxxxxxxxxxxxxx
>>>> IPv6: ADDRCONF(NETDEV_UP): veth1: link is not ready
>>>> IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
>>>> IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
>>>> 8021q: adding VLAN 0 to HW filter on device team0
>>>> ==================================================================
>>>> BUG: KASAN: stack-out-of-bounds in schedule_debug kernel/sched/core.c:3285
>>>> [inline]
>>>> BUG: KASAN: stack-out-of-bounds in __schedule+0x1977/0x1df0
>>>> kernel/sched/core.c:3395
>>>> Read of size 8 at addr ffff8801ad090000 by task syz-executor0/4718
>>> Weird, can you please help me decipher this? So here KASAN complains about
>>> wrong memory access in the scheduler.
> This looks like a result of a previous bad silent memory corruption.
> The KASAN report says there is a stack out-of-bounds in scheduler. And
> that if followed by slab corruption report in another task.
> fs/jbd2/transaction.c happens to be the first meaningful file in this
> crash, and so that's where it is attributed to.
> Rerunning the reproducer several times can maybe give some better
> glues, or maybe not, maybe they all will look equally puzzling.
> This part of the repro looks familiar:
> r1 = bpf$MAP_CREATE(0x0, &(0x7f0000002e40)={0x12, 0x0, 0x4, 0x6e, 0x0,
> 0x1}, 0x68)
> bpf$MAP_UPDATE_ELEM(0x2, &(0x7f0000000180)={r1, &(0x7f0000000000),
> &(0x7f0000000140)}, 0x20)
> We had exactly such consequences of a bug in bpf map very recently,
> but that was claimed to be fixed. Maybe not completely?
> +bpf maintainers

Looks like syzbot found this in Linus tree with HEAD commit 5b394b2ddf03 ("Linux 4.19-rc1")
one day later net PR got merged via 050cdc6c9501 ("Merge git://").

This PR contained a couple of fixes I did on sockmap code during audit such as:

Looking at the reproducer syzkaller found it contains:

r1 = bpf$MAP_CREATE(0x0, &(0x7f0000002e40)={0x12, 0x0, 0x4, 0x6e, 0x0, 0x1}, 0x68)

So it found the crash with map type of sock hash and key size of 0x0 (which is invalid),
where subsequent map update triggered the corruption. I just did a 'syz test' and it
wasn't able to trigger the crash anymore.

#syz fix: bpf, sockmap: fix sock_hash_alloc and reject zero-sized keys