Re: regression introduced by "block: Add support for DAX reads/writes to block devices"

From: Linda Knippers
Date: Wed Aug 05 2015 - 21:43:12 EST


On 08/05/2015 06:01 PM, Dave Chinner wrote:
> On Wed, Aug 05, 2015 at 04:19:08PM -0400, Jeff Moyer wrote:
>> Hi, Matthew,
>>
>> Linda Knippers noticed that commit (bbab37ddc20b) breaks mkfs.xfs:
>>
>> # mkfs -t xfs -f /dev/pmem0
>> meta-data=/dev/pmem0 isize=256 agcount=4, agsize=524288 blks
>> = sectsz=512 attr=2, projid32bit=1
>> = crc=0 finobt=0
>> data = bsize=4096 blocks=2097152, imaxpct=25
>> = sunit=0 swidth=0 blks
>> naming =version 2 bsize=4096 ascii-ci=0 ftype=0
>> log =internal log bsize=4096 blocks=2560, version=2
>> = sectsz=512 sunit=0 blks, lazy-count=1
>> realtime =none extsz=4096 blocks=0, rtextents=0
>> mkfs.xfs: read failed: Numerical result out of range
>>
>> I sat down with Linda to look into it, and the problem is that mkfs.xfs
>> sets the blocksize of the device to 512 (via BLKBSZSET), and then reads
>> from the last sector of the device. This results in dax_io trying to do
>> a page-sized I/O at 512 bytes from the end of the device.
>
> Right - we have to be able to do IO to that last sector, so this is
> a sanity check to tell if the block dev is large enough. The XFS
> kernel code does the same end-of-device sector read when the
> filesystem is mounted, too.
>
>> bdev_direct_access, receiving this bogus pos/size combo, returns
>> -ERANGE:
>>
>> if ((sector + DIV_ROUND_UP(size, 512)) >
>> part_nr_sects_read(bdev->bd_part))
>> return -ERANGE;
>>
>> Given that file systems supporting dax refuse to mount with a blocksize
>> != page size, I'm guessing this is sort of expected behavior. However,
>> we really shouldn't be breaking direct I/O on pmem devices.
>
> If the device is advertising 512 byte sector size support, then this
> needs to work, especially as DAX is completely transparent on the
> block device. Remember that DAX through a filesystem works on
> filesystem data block size boundaries, so a 512 byte sector/4k block
> size filesystem will be able to use DAX for mmapped files just fine.
>
>> So, what do you want to do? We could make the pmem device's logical
>> block size fixed at the sytem page size. Or, we could modify the dax
>> code to work with blocksize < pagesize. Or, we could continue using the
>> direct I/O codepath for direct block device access. What do you think?
>
> I don't know how the pmem device sets up it's limits. Can you post
> the output of:
>
> /sys/block/pmem0/queue/logical_block_size
512

> /sys/block/pmem0/queue/physical_block_size
512

> /sys/block/pmem0/queue/hw_sector_size
512

> /sys/block/pmem0/queue/minimum_io_size
512

> /sys/block/pmem0/queue/optimal_io_size
0

Let me know if you need anything else.

-- ljk


> As these all affect how mkfs.xfs configures the filesystem being
> made and so influences the size and alignment of the IO is does....
>
> Cheers,
>
> Dave.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/