Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

From: Sedat Dilek
Date: Fri Sep 25 2015 - 08:36:30 EST


On Fri, Sep 25, 2015 at 2:00 PM, Jiri Kosina <jikos@xxxxxxxxxx> wrote:
> On Thu, 24 Sep 2015, Sedat Dilek wrote:
>
>> >> >> > >> [ 24.705767] [<ffffffff8149287d>] dump_stack+0x7d/0xa0
>> >> >> > >> [ 24.705774] [<ffffffff810cbf7a>] ___might_sleep+0x28a/0x2a0
>> >> >> > >> [ 24.705779] [<ffffffff810cbc7f>] __might_sleep+0x4f/0xc0
>> >> >> > >> [ 24.705784] [<ffffffff810ae8ff>] start_flush_work+0x2f/0x290
>> >> >> > >> [ 24.705789] [<ffffffff810ae8ac>] flush_work+0x5c/0x80
>> >> >> > >> [ 24.705792] [<ffffffff810ae86a>] ? flush_work+0x1a/0x80
>> >> >> > >> [ 24.705799] [<ffffffff810eddcd>] ? trace_hardirqs_off+0xd/0x10
>> >> >> > >> [ 24.705804] [<ffffffff810ad938>] ? try_to_grab_pending+0x48/0x360
>> >> >> > >> [ 24.705810] [<ffffffff81917e13>] ? _raw_spin_lock_irqsave+0x73/0x80
>> >> >> > >> [ 24.705814] [<ffffffff810aecf9>] __cancel_work_timer+0x179/0x260
>> >> >>
>> >> >> This one is even more strange. It says that flush_work() is being called
>> >> >> from __cancel_work_timer() with IRQs disabled, but flags are explicitly
>> >> >> restored just one statement before that, and usbhid_close() explicitly
>> >> >> calls cancel_work_sync() after unconditionally enabling interrupts.
>> >> >>
>> >> >> So I am not able to make any sense of either of the traces really.
>> >> >>
>> >> >> Are you seeing this with the same .config with GCC-compiled kernel as
>> >> >> well?
>> >> >
>> >> > Actually could you please provide disassembly of your
>> >> > __cancel_work_timer()?
>> >> >
>> >>
>> >> Disassembly of which file - corresponding workqueue or hid file?
>> >
>> > make kernel/workqueue.o
>> > objdump -Dr kernel/workqueue.o
>> >
>> > and copy/paste output for __cancel_work_timer function.
>> >
>>
>> It was a bit long, so I attached the full output.
>
> 216a: ff 14 25 00 00 00 00 callq *0x0
> 216d: R_X86_64_32S pv_irq_ops+0x8
> 2171: 4c 89 ef mov %r13,%rdi
> 2174: e8 00 00 00 00 callq 2179 <__cancel_work_timer+0x189>
> 2175: R_X86_64_PC32 flush_work-0x4
>
> So the ordering generated by compiler is correct here.
>
> This however:
>
> [ 24.824639] hardirqs last enabled at (7913): [<ffffffff8192a2b2>] _raw_spin_unlock_irq+0x32/0x60
> [ 24.824646] hardirqs last disabled at (7914): [<ffffffff81120e37>] del_timer_sync+0x37/0x110
>
> combined with the stacktrace above, doesn't still make too much sense; IRQ
> disable/enable is properly paired in del_timer_sync() and timer base lock
> locking.
>
> Could you look into the generated assembly whether the sequence
>
> local_irq_save(flags);
> lock_map_acquire(&timer->lockdep_map);
> lock_map_release(&timer->lockdep_map);
> local_irq_restore(flags);
>
> in del_timer_sync() hasn't been optimized in some damaging way?
>

OK, now I see what you mean that snippet...

[ kernel/time/timer.c ]
...
int del_timer_sync(struct timer_list *timer)
{
#ifdef CONFIG_LOCKDEP
unsigned long flags;

/*
* If lockdep gives a backtrace here, please reference
* the synchronization rules above.
*/
local_irq_save(flags);
lock_map_acquire(&timer->lockdep_map);
lock_map_release(&timer->lockdep_map);
local_irq_restore(flags);
#endif
...

So, I could try to see with CONFIG_LOCKDEP=n ?

- Sedat -

[1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/kernel/time/timer.c?id=v4.3-rc2#n1080
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/