Re: kernel BUG at fs/ext/super.c:428

From: Darren Hart
Date: Mon Jan 26 2009 - 11:39:40 EST


Peter Zijlstra wrote:
On Wed, 2009-01-21 at 12:10 -0800, Pallipadi, Venkatesh wrote:
On Wed, 2009-01-14 at 13:20 -0800, Theodore Tso wrote:
On Wed, Jan 14, 2009 at 08:29:37PM +0100, Peter Zijlstra wrote:
38d47c1b7075bd7ec3881141bb3629da58f88dab is first bad commit
commit 38d47c1b7075bd7ec3881141bb3629da58f88dab
Author: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Date: Fri Sep 26 19:32:20 2008 +0200

futex: rely on get_user_pages() for shared futexes

On the way of getting rid of the mmap_sem requirement for shared futexes,
start by relying on get_user_pages().

Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Acked-by: Nick Piggin <nickpiggin@xxxxxxxxxxxx>
Signed-off-by: Ingo Molnar <mingo@xxxxxxx>

However does a futex change make ext3 crap its pants?
I agree, this doesn't make much sense. I've looked at the patch, and
I don't see how this would cause an ext3 orphaned-inode list handling
problem

Are you sure the bisect was done correctly? Have you tried reverting
that one commit, or otherwise conclusively shown that a kernel with
this commit fails, and one without this commit works just fine?

Unfortunately, I cannot revert this patch alone from upstream git.
But I consistently see
upstream git: Always produces this oops on reboot
checkout of ï38d47c1b: Always produces this oops on reboot
checkout of ïï94aca1da (one patch before the above commit): Reboots fine
without the oops.

This is petty specific to the particular userspace, looks like.
I only see this on SLES10 installation. Also, I need a non-root user
logged in at least once after boot through X to see this problem. I was
always seeing this as I had autologin on local terminal and was remotely
rebooting the system. If I just boot to init 3 or boot to init 5 with no
user logged in or boot to init 5 with root logged in, I do not see this
problem.

Ted, could this happen due an extra iput()?

In that case, Venki, does the below patch fix it?

Credit goes to Darren for spotting this.

---
kernel/futex.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index f89d373..f4132ab 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -929,7 +929,7 @@ out_unlock:

/* drop_futex_key_refs() must be called outside the spinlocks. */
while (--drop_count >= 0)
- drop_futex_key_refs(&key1);
+ drop_futex_key_refs(&key2);

Unfortunately, I realized later that this code was indeed correct and I asked Ingo to pull my patch implementing the above change. Quoting my previous mail on the subject:

"I believe what is happening here is that the requeue loop requeues each waiter from one futex (key1) to another (key2). It rightly takes a reference to the futex at key2 and then decrements the references to key1 by drop_count (since the waiters now reference key2, not key1). The newly taken key2 references will be dropped in futex_wait() when each waiter is woken up and takes the futex."

However, there are still two patches in linux-tip/core/futexes that addresses get|put symmetry of futex keys:

90621c40cc4ab7b0a414311ce37e7cc7173403b6
42d35d48ce7cefb9429880af19d1c329d1554e7a

However, the first is an addition of a WARN_ON (which is unlikely to catch this issue as it was geared toward catching puts on failed gets). The latter mostly adds puts where they were missing, so also unlikely to help.

--
Darren




out_put_keys:
put_futex_key(fshared, &key2);




--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/