Re: regression introduced by "block: Add support for DAX reads/writes to block devices"

From: Boaz Harrosh
Date: Thu Aug 06 2015 - 03:52:55 EST


On 08/06/2015 06:24 AM, Dave Chinner wrote:
> On Wed, Aug 05, 2015 at 09:42:54PM -0400, Linda Knippers wrote:
>> On 08/05/2015 06:01 PM, Dave Chinner wrote:
>>> On Wed, Aug 05, 2015 at 04:19:08PM -0400, Jeff Moyer wrote:
<>
>>>>
>>>> I sat down with Linda to look into it, and the problem is that mkfs.xfs
>>>> sets the blocksize of the device to 512 (via BLKBSZSET), and then reads
>>>> from the last sector of the device. This results in dax_io trying to do
>>>> a page-sized I/O at 512 bytes from the end of the device.
>>>

This part I do not understand. how is mkfs.xfs reading the sector?
Is it through open(/dev/pmem0,...) ? O_DIRECT?

If so then yes the inode of /dev/pmem0 is IS_DAX() and will try
to use the dax.c stuff. (I think, which Kernel?)

Which means this is a bug.

>>> Right - we have to be able to do IO to that last sector, so this is
>>> a sanity check to tell if the block dev is large enough. The XFS
>>> kernel code does the same end-of-device sector read when the
>>> filesystem is mounted, too.
>>>
>>>> bdev_direct_access, receiving this bogus pos/size combo, returns
>>>> -ERANGE:
>>>>
>>>> if ((sector + DIV_ROUND_UP(size, 512)) >
>>>> part_nr_sects_read(bdev->bd_part))
>>>> return -ERANGE;
>>>>
>>>> Given that file systems supporting dax refuse to mount with a blocksize
>>>> != page size, I'm guessing this is sort of expected behavior. However,
>>>> we really shouldn't be breaking direct I/O on pmem devices.
>>>

No this is a BUG. read/write buffered/direct to an IS_DAX() inode should
be able to be of any alignment size. Since with DAX buffered/direct is
exact same code path and buffered IO expects any size IO.

This is probably a bug in the DAX handling of the bdev-inode. Let me
test this. I will send a fix ASAP.

<>
>>> the output of:
>>>
>>> /sys/block/pmem0/queue/logical_block_size
>> 512
>>
>>> /sys/block/pmem0/queue/physical_block_size
>> 512
>>

There is a pending fix for this.
Do you need it sent to stable ?

>>> /sys/block/pmem0/queue/hw_sector_size
>> 512
>>
>>> /sys/block/pmem0/queue/minimum_io_size
>> 512
>>
>>> /sys/block/pmem0/queue/optimal_io_size
>> 0

Thanks
Boaz


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/