RE: regression introduced by "block: Add support for DAX reads/writes to block devices"

From: Wilcox, Matthew R
Date: Thu Aug 06 2015 - 10:21:34 EST


I think I see the problem. I'm kind of wrapped up in other things right now; can you try replacing the line in dax_io():

- bh->b_size = PAGE_ALIGN(end - pos);
+ bh->b_size = ALIGN(end - pos, 1 << blkbits);

-----Original Message-----
From: Jeff Moyer [mailto:jmoyer@xxxxxxxxxx]
Sent: Wednesday, August 05, 2015 1:19 PM
To: Wilcox, Matthew R; linda.knippers@xxxxxx
Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-fsdevel@xxxxxxxxxxxxxxx
Subject: regression introduced by "block: Add support for DAX reads/writes to block devices"

Hi, Matthew,

Linda Knippers noticed that commit (bbab37ddc20b) breaks mkfs.xfs:

# mkfs -t xfs -f /dev/pmem0
meta-data=/dev/pmem0 isize=256 agcount=4, agsize=524288 blks
= sectsz=512 attr=2, projid32bit=1
= crc=0 finobt=0
data = bsize=4096 blocks=2097152, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
mkfs.xfs: read failed: Numerical result out of range

I sat down with Linda to look into it, and the problem is that mkfs.xfs
sets the blocksize of the device to 512 (via BLKBSZSET), and then reads
from the last sector of the device. This results in dax_io trying to do
a page-sized I/O at 512 bytes from the end of the device.
bdev_direct_access, receiving this bogus pos/size combo, returns
-ERANGE:

if ((sector + DIV_ROUND_UP(size, 512)) >
part_nr_sects_read(bdev->bd_part))
return -ERANGE;

Given that file systems supporting dax refuse to mount with a blocksize
!= page size, I'm guessing this is sort of expected behavior. However,
we really shouldn't be breaking direct I/O on pmem devices.

So, what do you want to do? We could make the pmem device's logical
block size fixed at the sytem page size. Or, we could modify the dax
code to work with blocksize < pagesize. Or, we could continue using the
direct I/O codepath for direct block device access. What do you think?

Thaks,
Jeff and Linda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/