Re: Killing POSIX deadlock detection

From: Eric W. Biederman
Date: Sun Jun 06 2004 - 15:12:33 EST


Trond Myklebust <trond.myklebust@xxxxxxxxxx> writes:

> På su , 06/06/2004 klokka 09:27, skreiv Matthew Wilcox:
> \
> > > T1 locks file F1 -> lock (P1, F1)
> > > P2 locks file F2 -> lock (P2, F2)
> > > P2 locks file F1 -> blocks against (P1, F1)
> > > T1 locks file F2 -> blocks against (P2, F2)
> >
> > Less contrived example -- T2 locks file F2. We report deadlock here too,
> > even though T1 is about to unlock file F1.

There is a fairly sane linux specific definition here. We should
track these things not by pid or tid, but by struct files_struct.

> So what is better: report an error and give the user a chance to
> recover, or allowing the potential deadlock?

Reading the SUS definition below we should only report a deadlock when
it is certain.

For multiple processes with the same set of file descriptors open
that is an interesting graph problem. Unless there is nothing
another process can do, to remove the deadlock situation.

> Only the user can resolve problems such as the above threaded problem,
> given the SuS definitions.
>
> > So, final call. Any objections to never returning -EDEADLCK?
>
> Yes: As Chuck points out, that is a fairly nasty change of the userland
> API.

???? Failing to detect a deadlock is not a change in the API.
It is simply a change in behavior.

> Worse: it is a change that fixes only one problem for only a minority of
> users (those that combine locking over multiple NPTL threads - a
> situation which after the "fix" remains just as poorly defined) at the
> expense of reintroducing a series of deadlocking problems for those
> single threaded users that rely on the EDEADLK (and have done so
> throughout the entire 2.4.x series).

Relying on EDEADLK is broken. That is about as bad as relying on
getting -EACCESS instead of SIGSEGV.

Detecting deadlocks is certainly a quality of implementation issue.
But unless my memory is shaky detecting deadlocks is a hard problem.

Perhaps what we should do is simply not attempt to detect deadlocks
involving threaded processes.

With threads the problems escalates from one of cycle detection
to something fairly weird.

> Finally, EDEADLK does actually appear to be mandatory to implement in
> SUSv3, given that it states:
>
> A potential for deadlock occurs if a process controlling a
> locked region is put to sleep by attempting to lock another
> process' locked region. If the system detects that sleeping
> until a locked region is unlocked would cause a deadlock,
> fcntl() shall fail with an [EDEADLK] error.
>
> (again see
> http://www.opengroup.org/onlinepubs/009695399/functions/fcntl.html)

Hmm. I don't see that the system is required to detect a deadlock.
Just what it does after it has detected one.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/