Re: [PATCH] lockdep: Add a document describing crossrelease feature
From: Peter Zijlstra
Date: Fri Jul 01 2016 - 06:47:00 EST
So I really could not understand your initial changelogs, this text
seem to be somewhat better, so let me try and comment on this.
On Fri, Jul 01, 2016 at 01:15:38PM +0900, Byungchul Park wrote:
> +++ b/Documentation/locking/crossrelease.txt
> @@ -0,0 +1,276 @@
> +Crossrelease lock dependency check
> +==================================
> +
> +Started by Byungchul Park <byungchul.park@xxxxxxx>
> +
> +Contents:
> +
> + (*) What is a problem?
> +
> + - Original lockdep's assumptions.
> + - Original lockdep's limitation.
Their form doesn't make sense if we ever commit this. Nobody knows or
cares about an 'original' lockdep. There is only now.
> +Original lockdep's assumptions
> +------------------------------
> +
> +Original lockdep (not crossrelease featured lockdep) assumes that,
> +
> +1. A lock will be unlocked within the context holding the lock.
This is lock owner semantics; that is, each lock has a clear owner.
Which is a sane assumption, and a hard requirement for PI. Remember, all
this comes from the RT tree.
!owner locks cannot do PI and thus cannot be used for code you want to
provide deterministic behaviour with.
> +2. A lock has dependency with all locks already held in held_locks.
That's not really an assumption, given 1, this is a fact. An owner lock
can only depend on locks currently held. This is a corner stone of
proving things.
> +2. Acquiring is more important than releasing, to check its dependency.
s/2/3/
That's not an assumption; that's a hard requirement. Since the acquire
is the one blocking, you _have_ to check for cycles before you block,
otherwise you'll hit the deadlock and not get a report, because you're
deadlocked.
> +Original lockdep's limitation
> +-----------------------------
> +
> +Therefore, the original lockdep has limitations. It can be applied only
> +to typical lock operations, e.g. spin_lock, mutex, semaphore and the
This is wrong, semaphores are very much not covered by lockdep since
they do not have owner semantics (what you call crossmuck). (this is
the distinction between a binary semaphore and a mutex)
> +What causes deadlock
> +--------------------
> +
> +Not only lock operations, but also any operations causing to wait or
> +spin it e.g. all wait operations for an event, lock_page() and so on
> +can cause deadlock unless it's eventually released by someone. The most
> +important point here is that the waiting or spinning must be *released*
> +by someone. In other words, we have to focus whether the waiting and
> +spinning can be *released* or not to avoid deadlock, rather than
> +waiting or spinning it itself.
But since its the blocking that _is_ the deadlock, you'll never get your
report.
IOW, you rely on future behaviour to tell if now can make forwards
progress. This already implies a well formed program. You're inverting
causality.
> +Relax the assumptions
> +---------------------
> +
> +We can relax the assumtions the original lockdep has, which is not
> +necessary to check dependency and detect a deadlock.
> +
> +1. A lock can be unlocked in any context, unless the context itself
> + causes a deadlock e.g. acquiring a lock in irq-safe context before
> + releasing the lock in irq-unsafe context.
You fail to say how this preserves correctness. By relaxing this you
loose the held_lock dependencies and you destroy the entire proof that
currently underpins lockdep.
> +2. A lock has dependency with all locks in the releasing context, having
> + been held since the lock was held.
But you cannot tell this. The 'since the lock was held' thing fully
depends on timing and is not fundamentally correct.
lock(A)
unlock(A)
lock(A)
wait_for(B)
unlock(A)
wake(B)
Between the wait_for(B) and wake(B), _nothing_ has been held, yet still
there's the deadlock potential.
And note that if the timing was 'right', you would never get to wake(B)
because deadlock, so you'd never establish that there would be a
deadlock.
> Thus we can check the dependency
> + only after we identify the releasing context at first. Of course,
> + if we consider only typical lock e.g. spin lock, mutex, semaphore
> + and so on, then we can identify the releasing context at the time
> + acquiring a lock because the releasing context is same as the
> + releasing context for the typical lock. However, generally we have to
> + wait until the lock having been held will be eventually released to
> + identify the releasing context. We can say that the original lockdep
> + is a special case among all cases this crossrelease feature can deal
> + with.
I'm not sure you can say this at all; you've no proof of correctness
from which this special case flows.
> +3. Releasing is more important than acquiring to check its dependency.
> + Compare to the third assumption of original lockdep.
Again, you're inverting causality afaict. You depend on the future
happening to say now is correct.
> +Introduce "crosslock"
> +---------------------
> +
> +Crossrelease feature names a lock "crosslock" if it is releasable by a
> +different context from the context having acquired the lock. All locks
> +having been held in the context unlocking the crosslock until
> +eventually the crosslock will be unlocked, have dependency with the
> +crosslock. That's the key idea to implement crossrelease feature.
_all_ locks? That implies infinite storage, which is hardly feasible. If
you limit it, the limit would seem arbitrary and you loose your proof
(in so far as I can see, because you're not actually giving any).
Please, give a coherent, mathematical proof of correctness.
Because I'm not seeing how this thing would work.