Re: Lockdep and rw_semaphores

From: Vladislav Bolkhovitin
Date: Tue Sep 13 2011 - 21:52:23 EST



Al Viro, on 09/13/2011 01:17 AM wrote:
> On Mon, Sep 12, 2011 at 10:19:20PM -0400, Vladislav Bolkhovitin wrote:
>
>> Sure, if nested write locking is involved, lockdep should loudly complain, but on
>> the first nested write, not read, because what's the point to complain on nested
>> reads? I may not have nested write locking at all (and don't have).
>
> _What_ nested write locking? For that to happen it's enough to have
> unrelated threads try to grab these suckers exclusive at the wrong time.
> One thread for each lock. Separately. No nesting involved.
>
> And if you never grab these locks exclusive (again, not nested - anything
> grabbing them exclusive will suffice), why the fsck do you have them at
> all? rwsem that is never grabbed exclusive is rather pointless, isn't it?
>
>> Plus, the deadlock scenario lockdep described for me
>>
>> CPU0 CPU1
>> ---- ----
>> lock(QQ1_sem);
>> lock(QQ_sem);
>> lock(QQ1_sem);
>> lock(QQ_sem);
>>
>> *** DEADLOCK ***
>>
>> is simply wrong. Such deadlock can never happen, correct?
>
> Sigh... OK, let's spell it out:
>
> thread 1:
> down_read(&A); /* got it */
> thread 2:
> down_read(&B); /* got it */
> thread 3:
> down_write(&A); /* blocked until thread 1 releases A */
> thread 4:
> down_write(&B); /* blocked until thread 2 releases B */
> thread 1:
> down_read(&B); /* blocked until thread 4 gets and releases B */
> thread 2:
> down_read(&A); /* blocked until thread 3 gets and released A */
>
> and we have a deadlock. Note that thread 3 and thread 4 are just doing
> solitary down_write(), no other locks held. And if they happen to come
> a bit later, we won't get stuck.
>
> Again, if your code does that kind of out-of-order down_read(), it is broken.
> Period. Either there's literally nothing that would ever do down_write()
> (in which case you can remove the useless rwsem completely), or you can get
> a full-blown deadlock there. It's harder to hit (you need 4 tasks instead
> if 2, as you would with mutex_lock() out of order), but it's still quite
> triggerable.
>
> What you seem to be missing is that rwsems are fair to writers. I.e.
> down_read() blocks not only when there's a writer already holding the lock;
> it blocks when there's a writer blocked trying to acquire the lock. That's
> what makes this kind of out-of-order down_read() unsafe; unrelated thread
> doing a solitary down_write() is able to wedge the whole thing stuck. Without
> having acquired *any* locks - just waiting for readers to go away so it could
> get the damn thing exclusive.

Al, David,

I see your point and agree, thanks for the explanation. But my original points are
still valid:

1. Reverse read locking isn't always a deadlock. For instance, if only 1 write
thread participating and doesn't do nested write locking, which is a quite valid
scenario, because by design of rw locks they are used with many readers and
limited amount of rare writers.

2. The "deadlock" explanation lockdep is producing in such cases is wrong and
confusing, because in the explanation scenario there is no deadlock. If to do such
nice and helpful explanations, better to do it correctly.

So, it should be better if this warning is issued, if there is >1 thread write
locking detected on any participated rw lock, and illustrated with a correct
explanation.

Thanks,
Vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/