Re: [syzbot] WARNING in __init_work

From: Thomas Gleixner
Date: Sun Sep 19 2021 - 08:41:36 EST


Stephen,

On Wed, Sep 15 2021 at 19:29, Stephen Boyd wrote:
> Quoting Andrew Morton (2021-09-15 16:14:57)
>> On Wed, 15 Sep 2021 10:00:22 -0700 syzbot <syzbot+d6c75f383e01426a40b4@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>> >
>> > ODEBUG: object ffffc90000fd8bc8 is NOT on stack ffffc900022a0000, but annotated.
>
> This is saying that the object was supposed to be on the stack because
> debug objects was told that, but it isn't on the stack per the
> definition of object_is_on_stack().

Correct.

>> > <IRQ>
>> > __init_work+0x2d/0x50 kernel/workqueue.c:519
>> > synchronize_rcu_expedited+0x392/0x620 kernel/rcu/tree_exp.h:847
>
> This line looks like
>
> INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp);
>
> inside synchronize_rcu_expedited(). The rew structure is declared on the
> stack
>
> struct rcu_exp_work rew;

Yes, but object_is_on_stack() checks for task stacks only. And the splat
here is entirely correct:

softirq()
...
synchronize_rcu_expedited()
INIT_WORK_ONSTACK()
queue_work()
wait_event()

is obviously broken. You cannot wait in soft irq context.

synchronize_rcu_expedited() should really have a might_sleep() at the
beginning to make that more obvious.

The splat is clobbered btw:

[ 416.415111][ C1] ODEBUG: object ffffc90000fd8bc8 is NOT on stack ffffc900022a0000, but annotated.
[ 416.423424][T14850] truncated
[ 416.431623][ C1] ------------[ cut here ]------------
[ 416.438913][T14850] ------------[ cut here ]------------
[ 416.440189][ C1] WARNING: CPU: 1 PID: 2971 at lib/debugobjects.c:548 __debug_object_init.cold+0x252/0x2e5
[ 416.455797][T14850] refcount_t: addition on 0; use-after-free.

So there is a refcount_t violation as well.

Nevertheless a hint for finding the culprit is obviously here in that
call chain:

>> > bdi_remove_from_list mm/backing-dev.c:938 [inline]
>> > bdi_unregister+0x177/0x5a0 mm/backing-dev.c:946
>> > release_bdi+0xa1/0xc0 mm/backing-dev.c:968
>> > kref_put include/linux/kref.h:65 [inline]
>> > bdi_put+0x72/0xa0 mm/backing-dev.c:976
>> > bdev_free_inode+0x116/0x220 fs/block_dev.c:819
>> > i_callback+0x3f/0x70 fs/inode.c:224

The inode code uses RCU for freeing an inode object which then ends up
calling bdi_put() and subsequently in synchronize_rcu_expedited().

>> > rcu_do_batch kernel/rcu/tree.c:2508 [inline]
>> > rcu_core+0x7ab/0x1470 kernel/rcu/tree.c:2743
>> > __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
>> > invoke_softirq kernel/softirq.c:432 [inline]
>> > __irq_exit_rcu+0x123/0x180 kernel/softirq.c:636
>> > irq_exit_rcu+0x5/0x20 kernel/softirq.c:648
>> > sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1097
>> > </IRQ>
>>
>> Seems that we have a debugobject in the incorrect state, but it doesn't
>> necessarily mean there's something wrong in the bdi code. It's just
>> that the bdi code happened to be the place which called
>> synchronize_rcu_expedited().

Again, it cannot do that from a softirq because
synchronize_rcu_expedited() might sleep.

> Is it possible that object_is_on_stack() doesn't work in IRQ context?
> I'm not really following along on x86 but I could see where
> task_stack_page() gets the wrong "stack" pointer because the task has one
> stack and the irq stack is some per-cpu dedicated allocation?

Even if debug objects would support objects on irq stacks, the above is
still bogus. But it does not and will not because the operations here
have to be fully synchronous:

init() -> queue() or arm() -> wait() -> destroy()

because you obviously cannot queue work or arm a timer which are on stack
and then leave the function without waiting for the operation to complete.

So these operations have to be synchronous which is a NONO when running
in hard or soft interrupt context because waiting for the operation to
complete is not possible there.

Thanks,

tglx