Re: [PATCH 1/5] locking: Introduce range reader/writer lock

From: Jan Kara
Date: Mon Apr 03 2017 - 12:07:02 EST


On Mon 03-04-17 08:26:43, Davidlohr Bueso wrote:
> On Mon, 03 Apr 2017, Laurent Dufour wrote:
> >Le Tue, 28 Mar 2017 09:39:18 -0700,
> >Davidlohr Bueso <dave@xxxxxxxxxxxx> a écrit :
> >>I'll wait to see if there are any more concerns and send a v2 with
> >>your corrections.
> >
> >Hi Bavidlohr, I think there is a major issue regarding the task
> >catching a signal in wait_for_range().
> >I can see it when a thread is catching a signal, the process deadlock
> >in exit path.
> >
> >Let's imagine all these tasks waiting for the complete range lock, so
> >range doesn't matter:
> >
> >A get the lock in write
> >B want the read lock => B->blocking_range=1 (because of A)
> >C want the write lock => C->blocking_range=2 (A,B)
> >D want the read lock => D->blocking_range=3 (A,B,C)
> >=> C catch a signal and exit wait_for_ranges()
> >A release the lock
> > => B->blocking_range=0
> > => D->blocking_range=2 (D has not seen C removal)
> >=> B get the lock
> >B release the lock
> > => D->blocking_range=1
> >
> >D remains blocked while no one has the lock !
> >
> >The issue is when removing a task from the interval tree, we
> >should decrement all the blocking_ranges of the task added to that
> >range after the one leaving... I can't see an easy fix for that :(
> >
> >Am I right ?
>
> Yes. Peter had also mentioned the issue too. One way I though of fixing
> the problem was to track the jiffies timestamp in a per range_rwlock
> basis for when it was added, and in the signal_pending() case, along with
> removing the lock from the tree, we iterate the tree again and decrement
> the blocking_ranges for those with a higher timestamp. It would add some
> overhead, but again this is the unlikely() case. It also adds an extra 8
> bytes of footprint, but this is usually stack allocated.

Or just a plain sequence counter of the lock operations? There are ways how
we could implement the functionality without the counter and although
asymptotically they will be the same, they are more complex code-wise so I
think the counter is the best solution.

Hoza

--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR