Re: 2.6.25-git2: BUG: unable to handle kernel paging request at ffffffffffffffff

From: Jiri Slaby
Date: Tue Apr 22 2008 - 05:49:21 EST


Linus Torvalds napsal(a):

On Tue, 22 Apr 2008, Rafael J. Wysocki wrote:
The same place, dentry.d_hash.next is 1. No slub debug clues... I think, I'll give slab a try. Any other clues?
Well, SLUB uses some per CPU data structures. Is it possible that they get
corrupted and which leads to the observed symptoms?

It really doesn't look like the slub allocations themselves would be corrupted. It very much looks like wild pointers corrupting allocations that themselves were fine.

Hmm, correct.

What do you do to trigger this? Any particular load? Is it still just doing suspend/resume, or do you have something else that you are playing with?

Yesterday I did 2 suspend/resumes after 1 hour of uptime and ran git-status for a fraction of a second until it was killed. So I can perfectly reproduce it when I suspend, resume and produce some io load. I guess it's time to bisect 2.6.25-rc8-mm2 as I'm able to reproduce it the best and haven't seen that bug in -rc8-mm1 for over week of suspending and working.

Also, have you tried CONFIG_DEBUG_PAGEALLOC? That can also be a very powerful way to find memory corruption.

Not yet.

Does anybody see any other patterns? Looking at the modules linked in in the oopses from Zdenek, Rafael and Jiri, I don't see anything odd. You both all have 80211 support, maybe the corruption comes from the wireless layer?

May be, however I don't use that stack, it's a desktop machine, it's only sitting there not turned on, but sure, it's loaded.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/