Re: Sleeping BUG in khugepaged for i586

From: Larry Finger
Date: Tue Jun 06 2017 - 11:01:33 EST


On 06/06/2017 09:02 AM, Vlastimil Babka wrote:
On 06/05/2017 11:44 PM, Andrew Morton wrote:
On Sat, 3 Jun 2017 14:24:26 -0500 Larry Finger <Larry.Finger@xxxxxxxxxxxx> wrote:

I recently turned on locking diagnostics for a Dell Latitude D600 laptop, which
requires a 32-bit kernel. In the log I found the following:

BUG: sleeping function called from invalid context at mm/khugepaged.c:655
in_atomic(): 1, irqs_disabled(): 0, pid: 20, name: khugepaged
1 lock held by khugepaged/20:
#0: (&mm->mmap_sem){++++++}, at: [<c03d6609>]
collapse_huge_page.isra.47+0x439/0x1240
CPU: 0 PID: 20 Comm: khugepaged Tainted: G W

W means thre was WARN earler. Could be related... Got logs?

When I grabbed a splat, I got the last one in my log. The first one shows "Not tainted".


4.12.0-rc1-wl-12125-g952a068 #80

What is "wl-12125-g952a068"? What patches on top of mainline?

I found this while chasing a problem with one of the wireless drivers. For that reason I use Kalle Valo's wireless-testing-next, which happens to be the only kernel tree I have on this laptop. I'm reasonably certain that the extra updates are not the cause of the problem as the first one appears before any of the wireless drivers are loaded, but I will pull a clean copy of mainline to test that assumption.

Hardware name: Dell Computer Corporation Latitude D600
/03U652, BIOS A05 05/29/2003
Call Trace:
dump_stack+0x76/0xb2
___might_sleep+0x174/0x230
collapse_huge_page.isra.47+0xacf/0x1240
khugepaged_scan_mm_slot+0x41e/0xc00
? _raw_spin_lock+0x46/0x50
khugepaged+0x277/0x4f0
? prepare_to_wait_event+0xe0/0xe0
kthread+0xeb/0x120
? khugepaged_scan_mm_slot+0xc00/0xc00
? kthread_create_on_node+0x30/0x30
ret_from_fork+0x21/0x30

I have no idea when this problem was introduced. Of course, I will test any
proposed fixes.


Odd. There's nothing wrong with cond_resched() while holding mmap_sem.
It looks like khugepaged forgot to do a spin_unlock somewhere and we
leaked a preempt_count.

Hmm I'd expect such spin lock to be reported together with mmap_sem in
the debugging "locks held" message?

My bisection of the problem is about half done. My latest good version is commit 7b8cd33 and the latest bad one is 2ea659a. Only about 7 steps to go.

Larry