Re: [PATCH] dax: allow DAX to look up an inode's block device

From: Dan Williams
Date: Tue Feb 02 2016 - 18:38:23 EST


[ adding btrfs ]

On Tue, Feb 2, 2016 at 3:19 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> On Tue, Feb 02, 2016 at 04:11:42PM -0700, Ross Zwisler wrote:
>
>> However, for raw block devices and for XFS with a real-time device, the
>> value in inode->i_sb->s_bdev is not correct. With the code as it is
>> currently written, an fsync or msync to a DAX enabled raw block device will
>> cause a NULL pointer dereference kernel BUG. For this to work correctly we
>> need to ask the block device or filesystem what struct block_device is
>> appropriate for our inode.
>>
>> To that end, add a get_bdev(struct inode *) entry point to struct
>> super_operations. If this function pointer is non-NULL, this notifies DAX
>> that it needs to use it to look up the correct block_device. If
>> i_sb->get_bdev() is NULL DAX will default to inode->i_sb->s_bdev.
>
> Umm... It assumes that bdev will stay pinned for as long as inode is
> referenced, presumably? If so, that needs to be documented (and verified
> for existing fs instances). In principle, multi-disk fs might want to
> support things like "silently move the inodes backed by that disk to other
> ones"...

I assume btrfs is the only fs we have that might reassign the bdev for
a given inode on the fly? Hopefully we don't need anything stronger
than rcu_read_lock() to pin the result as valid.

At least in this case the initial user is dax-fsync where the
->get_bdev() answer should be static for the life of the inode, and
btrfs does not currently interface with dax. But yes, we need to get
the expected semantics clear.