Re: [patch 0/5] lightweight robust futexes: -V1

From: Ingo Molnar
Date: Wed Feb 15 2006 - 15:43:42 EST



* Andi Kleen <ak@xxxxxxx> wrote:

> On Wednesday 15 February 2006 18:50, Ulrich Drepper wrote:
> > Andi Kleen wrote:
> > > e.g. you could add a new VMA flag that says "when one user
> > > of this dies unexpectedly by a signal kill all"
> >
> > "kill all"?
>
> It would solve the problem statement given by Ingo in the rationale
> for this kernel patch - cleaning up after a hanging yum.

no, it wouldnt solve it. How do you know in userspace that the futex got
orphan? Say the futex value is stuck at '2'. How do you know it's hung
due to the premature exit of its owner?

Premature exit while holding locks is a fundamental property of the
locking abstraction - which, in the case of futex based locks, cannot be
implemented without kernel help. It's a tough problem that we struggled
with for years to solve properly.

> If there are any other problems this is intended to solve then they
> should be stated in the rationale.

i really only wrote that quick and light preamble to introduce the topic
- not to start any discussion whether the topic exists. The topic
obviously exists, check out pthread_mutex_t robustness APIs in other
OSs:

http://docs.sun.com/app/docs/doc/816-5137/6mba5vpjf?a=view#sync-103
http://docs.sun.com/app/docs/doc/816-5137/6mba5vpjg?a=view#sync-116

the vma-based robust-futex patches have been circulating on lkml for
quite some time, they were even in -mm for some time. Check out:

http://developer.osdl.org/dev/robustmutexes/

for a background. To make sure: our patch handles the 'robust' portion,
not the PI portion - just like the previous vma-based patches previously
sent to lkml. I.e. robustness comes first - it's a feature orthogonal to
priority inheritance.

just in case you were wondering: this is not an ad-hoc topic we made up
:-|

> > > And what happens if the patch is rejected? I don't really think you
> > > can force patches in this way ("do it or I break glibc")
> >
> > Nothing which relies on the syscalls goes into cvs unless the kernel
> > side is first committed. I never do this.

Andi, what were you thinking by suggesting that we want to "force
patches"? Really, have we ever done that? Ulrich is known to be very
strict about ABI issues, and he probably wont even trust Linus himself
saying that a syscall will go in - he'll only trust the commit log.

Frankly, i thought we are offering a cool new feature as a solution to a
hard problem, and i'm really startled (and saddened) to see such a
default assumption of malice from you :-( I dont think we deserved that.

> > > What happens when the list gets corrupted? Does the kernel go
> > > into an endless loop? Kernel going through arbitary user
> > > structures like this seems very risky to me. There are ways to do
> > > list walking with cycle detection, but they still have quite
> > > bad worst case detection times.
> >
> > The list being corrupted means that the mutexes are corrupted. In
> > which case the application would crash anyway.
>
> I'm not concerned about the application, just about the kernel.

i said it multiple times in the announcement:

> > > > If the thread/process crashed or terminated in some incorrect
> > > > way then the list might be non-empty: in this case the kernel
> > > > carefully walks the list [not trusting it],
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
and:

> > > > i've tested the new syscalls on x86 and x86_64, and have made
> > > > sure the parsing of the userspace list is robust [ ;-) ]
> > > > even if the list is deliberately corrupted.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You could even have read the code - the function is called
exit_robust_list(), and is commented quite extensively:

+/*
+ * Walk curr->robust_list (very carefully, it's a userspace list!)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+ * and mark any locks found there dead, and notify any waiters.
+ *
+ * We silently return on any sign of list-walking problem.
+ */
+void exit_robust_list(struct task_struct *curr)

again, i'm startled at your clearly baseless negative reaction :-(

> > As for the "endless loop". You didn't read the code, it seems. There
> > are two mechanisms to prevent this: the list is destroyed when the
> > individual elements are handled and there is an upper limit on the
> > number of robust mutexes which can be registered. The limit is
> > ridiculously high so it'll no problem for correct programs and it also
> > will eliminate run-away list following code.
>
> Ok good that's handled. How about long blocking on swapped out pages
> in exit?

We already touch userspace pages in do_exit() (and had been for years),
so we may already 'block' on a swapped out page - which will become
swapped in and then the kernel continues. The maximum amount of delay
depends on how many pages are touched. You can think of the list-walking
as a 'preamble' to the thread exiting - it could have been triggered by
userspace, straight before the do_exit() was executed.

Furthermore, there is a limit on the maximum number of list entries
walked (ROBUST_LIST_LIMIT) - i set it to a quite high number [to be able
to test performance], but we could reduce it significantly.
Realistically there wont be more than say a dozen locks held at exit
time.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/