Re: 2.6.10-rc1-mm5 [u]

From: Martin Schlemmer [c]
Date: Sat Nov 13 2004 - 19:09:07 EST

Next message: Jan Dittmer: "[PATCH] media/video module_param conversion"
Previous message: Måns Rullgård: "Re: System call number"
In reply to: Andrew Morton: "Re: 2.6.10-rc1-mm5 [u]"
Next in thread: Jamie Lokier: "Re: 2.6.10-rc1-mm5 [u]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sat, 2004-11-13 at 15:24 -0800, Andrew Morton wrote:
> "Martin Schlemmer [c]" <azarah@xxxxxxxxxxxxxxxx> wrote:
> >
> > > Could you please try:
> > >
> > > wget ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.10-rc1/2.6.10-rc1-mm5/broken-out/futex_wait-fix.patch
> > > patch -R -p1 < futex_wait-fix.patch
> > >
> > > the retest?
> >
> > Yep, this seems to fix it (usually one thread at least have already hung
> > at start for evo, now fine after a few mail fetches).
>
> OK, thanks.
>
> > Is this a regression, or as in the other thread an issue with evolution
> > that should be fixed ? (Note: gnome-btdownload also seemed to hang
> > overnight with -mm[45], but I do not know if its related)
>
> Don't know. It's hard to see why that patch would cause gross misbehaviour
> in evolution and apache. We may have to just revert it and take another
> look at the problem which it fixes.
>

I will be honest that I know next to nothing about relevant code, but
this looks fishy:

------
@@ -486,6 +486,8 @@
if (unlikely(ret != 0))
goto out_release_sem;

- queue_me(&q, -1, NULL);
-
/*
* Access the page after the futex is queued.
* We hold the mmap semaphore, so the mapping cannot have changed
------

The 'queue_me(&q, -1, NULL);' gets moved down, but just below where it
was we have a comment before a get_user() ... it is not possible that
because the patch cause the get_user() to be before queue_me(), that
we can have two (or more?) get_user()'s for the same futex at relatively
the same time?

> One thing I do note which is unrelated to this problem is that futex_wait()
> does get_user() inside down_read(mmap_sem). But a fault will do a second
> down_read(). And doing down_read() twice from within the same thread is a
> bug, because an intervening down_write() from another thread will cause
> deadlock.
>

I will rather leave this to somebody that now something about this =)

Thanks,

--
Martin Schlemmer

Attachment: signature.asc
Description: This is a digitally signed message part

Next message: Jan Dittmer: "[PATCH] media/video module_param conversion"
Previous message: Måns Rullgård: "Re: System call number"
In reply to: Andrew Morton: "Re: 2.6.10-rc1-mm5 [u]"
Next in thread: Jamie Lokier: "Re: 2.6.10-rc1-mm5 [u]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]