Re: shared/298 lockdep splat?
From: Dave Chinner
Date: Mon Sep 25 2017 - 23:55:03 EST
On Thu, Sep 21, 2017 at 05:47:14PM +0900, Byungchul Park wrote:
> On Thu, Sep 21, 2017 at 08:22:56AM +1000, Dave Chinner wrote:
> > Peter, this is the sort of false positive I mentioned were likely to
> > occur without some serious work to annotate the IO stack to prevent
> > them. We can nest multiple layers of IO completions and locking in
> > the IO stack via things like loop and RAID devices. They can be
> > nested to arbitrary depths, too (e.g. loop on fs on loop on fs on
> > dm-raid on n * (loop on fs) on bdev) so this new completion lockdep
> > checking is going to be a source of false positives until there is
> > an effective (and simple!) way of providing context based completion
> > annotations to avoid them...
>
> Hello,
>
> It looks caused by that &ret.event in submit_bio_wait() is initialized
> with the same class for all layers. I mean that completion variables in
> different layers should be initialized with different classes, as you do
> for typical locks in xfs.
Except that submit_bio_wait() is generic block layer functionality
and can be used by anyone. Whatever solution you decide on, it has
to be generic. And keep in mind that any code that submits a bio
themselves and waits on a completion event from the bio is going to
have to do their own annotations, which makes this a real PITA.
> I am not sure if I understand how xfs works correctly. Right? If yes,
> how can we distinguish between independent 'bio's in submit_bio_wait()?
> You or I can make it work with the answer. No?
Has nothing to do with XFS - it has no clue where it sits in the
block device stack and has no business screwing with bio internals
and stack layering to handle issues with stacked block devices....
Cheers,
Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx