Re: UBSAN: array-index-out-of-bounds in alg_bind

From: Jann Horn
Date: Sat Oct 17 2020 - 07:02:50 EST


On Sat, Oct 17, 2020 at 12:50 PM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> On Sat, Oct 17, 2020 at 5:49 AM Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> > On Fri, Oct 16, 2020 at 01:12:24AM -0700, syzbot wrote:
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=92ead4eb8e26a26d465e
> > > [...]
> > > Reported-by: syzbot+92ead4eb8e26a26d465e@xxxxxxxxxxxxxxxxxxxxxxxxx
> > > [...]
> > > UBSAN: array-index-out-of-bounds in crypto/af_alg.c:166:2
> > > index 91 is out of range for type '__u8 [64]'
> >
> > This seems to be an "as intended", if very odd. false positive (the actual
> > memory area is backed by the on-stack _K_SS_MAXSIZE-sized sockaddr_storage
> > "address" variable in __sys_bind. But yes, af_alg's salg_name member
> > size here doesn't make sense.
>
> As Vegard noted elsewhere, compilers can start making assumptions
> based on absence of UB and compile code in surprising ways as the
> result leading to very serious and very real bugs.
>
> One example would be a compiler generating jump table for common sizes
> during PGO and leaving size > 64 as wild jump.
>
> Another example would be a compiler assuming that copy size <= 64.
> Then if there is another copy into a 64-byte buffer with a proper size
> check, the compiler can now drop that size check (since it now knows
> size <= 64) and we get real stack smash (for a copy that does have a
> proper size check before!).

FWIW, the kernel currently still has a bunch of places that use
C89-style length-1 arrays (which were in the past used to work around
C89's lack of proper flexible arrays). Gustavo A. R. Silva has a bunch
of patches pending to change those places now, but those are not
marked for stable backporting; so in all currently released kernels,
we'll probably keep having length-1 arrays at the ends of C structs
that are used as if they were flexible arrays. (Unless someone makes
the case that these patches are not just cleanups but actually fix
some sort of real bug, and therefore need to be backported.)

The code in this example looks just like one of those C89-style
length-1 arrays to me (except that the length isn't 1).

Of course I do agree that this should be cleaned up, and that having
bogus array lengths in the source code is a bad idea.

> And we do want compilers to be that smart today. Because of all levels
> of abstractions/macros/inlining we actually have lots of
> redundant/nonsensical code in the end after all inlining and
> expansions, and we do want compilers to infer things, remove redundant
> checks, etc so that we can have both nice abstract source code and
> efficient machine code at the same time.

I guess that kinda leads to the question: Do we just need to fix the
kernel code here (which is comparatively easy), or do you think that
this is a sufficiently big problem that we need to go and somehow
change the actual UAPI headers here (e.g. by deprecating the existing
UAPI struct and making a new one with a different name)?