Re: e1000_netpoll() , BUG: sleeping function called from invalid context

From: Gabriel C
Date: Fri Feb 17 2017 - 18:44:34 EST




On 18.02.2017 00:25, Cong Wang wrote:
On Fri, Feb 17, 2017 at 3:16 PM, Gabriel C <nix.or.die@xxxxxxxxx> wrote:


On 17.02.2017 23:38, Cong Wang wrote:

On Fri, Feb 17, 2017 at 1:20 PM, Gabriel C <nix.or.die@xxxxxxxxx> wrote:

Hi all,

while poking at a different issue I found the following on my logs :

[85362.132770] BUG: sleeping function called from invalid context at
kernel/irq/manage.c:110
[85362.132771] in_atomic(): 1, irqs_disabled(): 1, pid: 1153, name:
systemd-journal
[85362.132772] no locks held by systemd-journal/1153.
[85362.132772] irq event stamp: 60088359
[85362.132777] hardirqs last enabled at (60088359): [<ffffffff810d07c2>]
vprintk_emit+0x432/0x470
[85362.132779] hardirqs last disabled at (60088358): [<ffffffff810d03ec>]
vprintk_emit+0x5c/0x470
[85362.132782] softirqs last enabled at (60088258): [<ffffffff810688fd>]
__do_softirq+0x22d/0x290
[85362.132784] softirqs last disabled at (60088233): [<ffffffff81068c0a>]
irq_exit+0x6a/0xd0
[85362.132784] Preemption disabled at:
[85362.132787] [<ffffffff815203de>] write_msg+0x4e/0xf0
[85362.132790] CPU: 0 PID: 1153 Comm: systemd-journal Tainted: G
I
4.10.0-rc8-debug-00001-ga1015e374d94-dirty #5
[85362.132791] Hardware name: FUJITSU PRIMERGY
TX200 S5 /D2709, BIOS 6.00 Rev. 1.14.2709
02/04/2013
[85362.132792] Call Trace:
[85362.132796] dump_stack+0x86/0xc1
[85362.132799] ___might_sleep+0x213/0x230
[85362.132801] __might_sleep+0x6b/0x80
[85362.132803] synchronize_irq+0x33/0x90
[85362.132805] ? __irq_put_desc_unlock+0x19/0x40
[85362.132807] ? __disable_irq_nosync+0x4e/0x60
[85362.132808] disable_irq+0x17/0x20



Hmm, your kernel base version is 4.10.0-rc8 but the symbols here
look like prior to my commit, because with my commit here should
be disable_hardirq() calling synchronize_hardirq().

Did you revert it or make any local changes?


The kernel is -rc8 with reverted d966564fcdc19e13eb6ba1fbe6b8101070339c3d
+
http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=202461e2f3c15dbfb05825d29ace0d20cdf55fa4
+ an debug patch from Thomas to find these goldfish issues.
(
http://ftp.frugalware.org/pub/other/people/crazy/kernel/t/goldfish-debug.patch
)

No other changes..

That is weird, the stack trace doesn't match the source code for some reason.
Can you objdump your e1000.ko module to see if that is true?


My card seems to use the e1000e driver which is buit-in..

Anyway here an objdump -x :

http://ftp.frugalware.org/pub/other/people/crazy/kernel/t/objdump-x_e1000.ko.txt