Re: [PATCH 1/2] dm-snapshot: fix crash with the realtime kernel

From: Scott Wood
Date: Wed Nov 13 2019 - 01:01:36 EST


On Tue, 2019-11-12 at 13:45 +0200, Nikos Tsironis wrote:
> On 11/12/19 9:50 AM, Mikulas Patocka wrote:
> >
> > On Mon, 11 Nov 2019, Mike Snitzer wrote:
> >
> > > On Mon, Nov 11 2019 at 11:37am -0500,
> > > Nikos Tsironis <ntsironis@xxxxxxxxxxx> wrote:
> > >
> > > > On 11/11/19 3:59 PM, Mikulas Patocka wrote:
> > > > > Snapshot doesn't work with realtime kernels since the commit
> > > > > f79ae415b64c.
> > > > > hlist_bl is implemented as a raw spinlock and the code takes two
> > > > > non-raw
> > > > > spinlocks while holding hlist_bl (non-raw spinlocks are blocking
> > > > > mutexes
> > > > > in the realtime kernel, so they couldn't be taken inside a raw
> > > > > spinlock).
> > > > >
> > > > > This patch fixes the problem by using non-raw spinlock
> > > > > exception_table_lock instead of the hlist_bl lock.
> > > > >
> > > > > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx>
> > > > > Fixes: f79ae415b64c ("dm snapshot: Make exception tables
> > > > > scalable")
> > > > >
> > > >
> > > > Hi Mikulas,
> > > >
> > > > I wasn't aware that hlist_bl is implemented as a raw spinlock in the
> > > > real time kernel. I would expect it to be a standard non-raw
> > > > spinlock,
> > > > so everything works as expected. But, after digging further in the
> > > > real
> > > > time tree, I found commit ad7675b15fd87f1 ("list_bl: Make list head
> > > > locking RT safe") which suggests that such a conversion would break
> > > > other parts of the kernel.
> > >
> > > Right, the proper fix is to update list_bl to work on realtime (which
> > > I
> > > assume the referenced commit does). I do not want to take this
> > > dm-snapshot specific workaround that open-codes what should be done
> > > within hlist_{bl_lock,unlock}, etc.
> >
> > If we change list_bl to use non-raw spinlock, it fails in dentry lookup
> > code. The dentry code takes a seqlock (which is implemented as preempt
> > disable in the realtime kernel) and then takes a list_bl lock.
> >
> > This is wrong from the real-time perspective (the chain in the hash
> > could
> > be arbitrarily long, so using non-raw spinlock could cause unbounded
> > wait), however we can't do anything with it.
> >
> > I think that fixing dm-snapshot is way easier than fixing the dentry
> > code.
> > If you have an idea how to fix the dentry code, tell us.
>
> I too think that it would be better to fix list_bl. dm-snapshot isn't
> really broken. One should be able to acquire a spinlock while holding
> another spinlock.

That's not universally true -- even in the absence of RT there are nesting
considerations. But it would probably be good if raw locks weren't hidden
inside other locking primitives without making it clear (ideally in the
function names) that it's a raw lock.

> Moreover, apart from dm-snapshot, anyone ever using list_bl is at risk
> of breaking the realtime kernel, if he or she is not aware of that
> particular limitation of list_bl's implementation in the realtime tree.

In particular the non-rcu variant seems inherently bad unless you protect
traversal with some other lock (in which case why use bl at all?). Maybe
fully separate the rcu version of list_bl and keep using raw locks there
(with the name clearly indicating so), with the side benefit that
accidentally mixing rcu and non-rcu operations on the same list would become
a build error, and convert the non-rcu list_bl to use non-raw locks on RT.

BTW, I'm wondering what makes bit spinlocks worse than raw spinlocks on
RT... commit ad7675b15fd87f19 says there's no lockdep visibility, but that
seems orthogonal to RT, and could be addressed by adding a dep_map on debug
builds the same way the raw lock is currently added. The other bit spinlock
conversion commits that I could find are replacing them with non-raw locks.

-Scott