Re: About the try to remove cross-release feature entirely by Ingo

From: Byungchul Park
Date: Tue Jan 02 2018 - 21:38:47 EST


On 1/2/2018 1:00 AM, Theodore Ts'o wrote:
On Mon, Jan 01, 2018 at 02:18:55AM -0800, Matthew Wilcox wrote:
Clarification: all TCP connections that are used by kernel code would
need to be in their own separate lock class. All TCP connections used
only by userspace could be in their own shared lock class. You can't
use a one lock class for all kernel-used TCP connections, because of
the Network Block Device mounted on a local file system which is then
exported via NFS and squirted out yet another TCP connection problem.

So the false positive you're concerned about is write-comes-in-over-NFS
(with socket lock held), NFS sends a write request to local filesystem,
local filesystem sends write to block device, block device sends a
packet to a socket which takes that socket lock.

It's not just the socket lock, but any of the locks/mutexes/"waiters"
that might be taken in the TCP code path and below, including in the
NIC driver.

I don't think we need to be as drastic as giving each socket its own lock
class to solve this. All NFS sockets can be in lock class A; all NBD
sockets can be in lock class B; all user sockets can be in lock class
C; etc.

But how do you know which of the locks taken in the networking stack
are for the NBD versus the NFS sockets? What manner of horrific
abstraction violation is going to pass that information all the way
down to all of the locks that might be taken at the socket layer and
below?

How is this "proper clasification" supposed to happen? It's the
repeated handwaving which claims this is easy which is rather
frustrating. The simple thing is to use a unique ID which is bumped
for each struct sock, each struct super, struct block_device, struct
request_queue, struct bdi, etc, but that runs into lockdep scalability
issues.

This is what I mentioned with group ID in an example for you before.
To do that, the most important thing is to prevent running into
lockdep scalability.

--
Thanks,
Byungchul