Re: m68k boot failure in -next bisected to 'xarray: Replace exceptional entries'

From: Matthew Wilcox
Date: Sat Jun 23 2018 - 03:47:15 EST


On Fri, Jun 22, 2018 at 03:33:35PM -0700, Guenter Roeck wrote:
> On Fri, Jun 22, 2018 at 02:05:19PM -0700, Matthew Wilcox wrote:
> > On Fri, Jun 22, 2018 at 11:42:46AM -0700, Guenter Roeck wrote:
> > > Hi,
> > >
> > > a few days ago, m68k boot tests in linux-next started to crash.
> > > I bisected the problem to commit 'xarray: Replace exceptional entries'.
> > > Bisect and crash logs are attached below.
> >
> > Thank you! I was afraid something like this might happen.
> >
> > > ------------[ cut here ]------------
> > > WARNING: CPU: 0 PID: 1 at lib/idr.c:42 idr_alloc_u32+0x44/0xe8
> >
> > Line 42 is:
> >
> > if (WARN_ON_ONCE(radix_tree_is_internal_node(ptr)))
> > return -EINVAL;
> >
> > The pointer passed in to idr_alloc() is not 4-byte aligned; it's aligned
> > to a 2 byte boundary. I'm having a little trouble seeing who it is that's
> > passing in what pointer ...
> >
> > > Call Trace: [<000180d6>] __warn+0xc0/0xc2
> > > [<000020e8>] do_one_initcall+0x0/0x140
> > > [<0001816a>] warn_slowpath_null+0x26/0x2c
> > > [<002b50e4>] idr_alloc_u32+0x44/0xe8
> > > [<002b50e4>] idr_alloc_u32+0x44/0xe8
> > > [<002b51e4>] idr_alloc+0x5c/0x76
> > > [<00247160>] genl_register_family+0x14c/0x54c
> >
> > It makes sense to here (other than idr_alloc being listed twice)
> >
> > > [<000020e8>] do_one_initcall+0x0/0x140
> > > [<003f0f02>] genl_init+0x0/0x34
> >
> > Assuming this is right, that would imply that genl_ctrl is not 4-byte
> > aligned. Is that true? I'm not familiar with the m68k alignment rules,
> > but it has a lot of 4-byte sized quantities in the struct, so I would
> > assume it's 4-byte aligned.
> >
> > > [<003f0ce6>] bpf_lwt_init+0x10/0x14
> >
> > I don't think this is the caller.
> >
>
> Here is the culprit:
>
> genl_register_family(0x36dd7a) registering VFS_DQUOT
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 1 at lib/idr.c:42 idr_alloc_u32+0x44/0xe8
>
> It may be odd that fs/quota/netlink.c:quota_genl_family is not word
> aligned, but on the other side I don't think there is a rule that
> the function parameter to genl_register_family() - or the second
> parameter of idr_alloc() - must be word aligned. Am I missing
> something ? After all, it could be a pointer to the nth element
> of a string, or the caller could on purpose allocate IDRs for
> (ptr), (ptr + 1), and so on.

There actually is a rule that pointers passed to the IDR be aligned.
It might not be written down anywhere ;-) And I'm quite happy to lift
that restriction; after all I don't want to force everybody to decorate
definitions with __aligned(4).

I'll see what I can do to fix it. I'm actually on holiday this week,
so a fix may be delayed.