Re: splice vs execve lockdep trace.

From: Linus Torvalds
Date: Tue Jul 16 2013 - 16:18:15 EST


On Tue, Jul 16, 2013 at 12:33 PM, Ben Myers <bpm@xxxxxxx> wrote:
>> >
>> > And looking more at that, I'm actually starting to think this is an
>> > XFS locking problem. XFS really should not call back to splice while
>> > holding the inode lock.

.. that was misleading, normally "inode lock" would be i_lock, but
here I meant the XFS-specific i_iolock.

But you clearly picked up on it:

> CPU0 CPU1 CPU2
> ---- ---- ----
> lock(&sig->cred_guard_mutex);
> lock(&pipe->mutex/1);
> lock(&(&ip->io_lock)->mr_lock);
> lock(&(&ip->io_lock)->mr_lock);
> lock(&sig->cred_guard_mutex);
> lock(&pipe->mutex/1);

Yup.

> I agree that fixing this in XFS seems to be the most reasonable plan,
> and Dave's approach looks ok to me. Seems like patch 1 should go
> through Al's tree, but we'll also need to commit it to the xfs tree
> prerequisite to patch 2.

Btw, is there some reason why XFS couldn't just use
generic_file_splice_read() directly?

I'm not arguing against Dave's patches, since the splice_write case
would seem to want them regardless, but I'm not even seeing why XFS
couldn't just avoid the locking for the splice_read case entirely..And
the generic file-splice read function does all that read-ahead stuff
and should actually be better for performance. And because it does it
from the page cache, it can avoid the locking issues (because it gets
the splice pipe lock later, just holding on to the page references).

And splice has mmap semantics - the whole point of splice is about
moving pages around, after all - so I *think* your current
"xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);" is actually over-serialization.
The pages will be shared by the pipe buffers anyway, so splicing by
definition doesn't imply full data serialization (because the reading
of the data itself will happen much later).

So the per-page serialization done by readpage() should already be
sufficient, no?

I dunno. Maybe there's something I'm missing. XFS does seem to wrap
all the other generic functions in that lock too, but since mmap() etc
clearly have to be able to get things one page at a time (without any
wrapping at higher layers), I'm just wondering whether splice_read
might not be able to avoid it.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/