Re: DAX can not work on virtual nvdimm device

From: Jan Kara
Date: Fri Sep 09 2016 - 05:20:04 EST


On Thu 08-09-16 14:47:08, Ross Zwisler wrote:
> On Tue, Sep 06, 2016 at 05:06:20PM +0200, Jan Kara wrote:
> > On Thu 01-09-16 20:57:38, Ross Zwisler wrote:
> > > On Wed, Aug 31, 2016 at 04:44:47PM +0800, Xiao Guangrong wrote:
> > > > On 08/31/2016 01:09 AM, Dan Williams wrote:
> > > > >
> > > > > Can you post your exact reproduction steps? This test is not failing for me.
> > > > >
> > > >
> > > > Sure.
> > > >
> > > > 1. make the guest kernel based on your tree, the top commit is
> > > > 10d7902fa0e82b (dax: unmap/truncate on device shutdown) and
> > > > the config file can be found in this thread.
> > > >
> > > > 2. add guest kernel command line: memmap=6G!10G
> > > >
> > > > 3: start the guest:
> > > > x86_64-softmmu/qemu-system-x86_64 -machine pc,nvdimm --enable-kvm \
> > > > -smp 16 -m 32G,maxmem=100G,slots=100 /other/VMs/centos6.img -monitor stdio
> > > >
> > > > 4: in guest:
> > > > mkfs.ext4 /dev/pmem0
> > > > mount -o dax /dev/pmem0 /mnt/pmem/
> > > > echo > /mnt/pmem/xxx
> > > > ./mmap /mnt/pmem/xxx
> > > > ./read /mnt/pmem/xxx
> > > >
> > > > The source code of mmap and read has been attached in this mail.
> > > >
> > > > Hopefully, you can detect the error triggered by read test.
> > > >
> > > > Thanks!
> > >
> > > Okay, I think I've isolated this issue. Xiao's VM was an old CentOS 6 system,
> > > and for some reason ext4+DAX with the old tools found in that VM fails. I was
> > > able to reproduce this failure with a freshly installed CentOS 6.8 VM.
> > >
> > > You can see the failure with his tests, or perhaps more easily with this
> > > series of commands:
> > >
> > > # mkfs.ext4 /dev/pmem0
> > > # mount -o dax /dev/pmem0 /mnt/pmem/
> > > # touch /mnt/pmem/x
> > > # md5sum /mnt/pmem/x
> > > md5sum: /mnt/pmem/x: Bad address
> > >
> > > This sequence of commands works fine in the old CentOS 6 system if you use XFS
> > > instead of ext4, and it works fine with both ext4 and XFS in CentOS 7 and
> > > with recent versions of Fedora.
> > >
> > > I've added the ext4 folks to this mail in case they care, but my guess is that
> > > the tools in CentOS 6 are so old that it's not worth worrying about. For
> > > reference, the kernel in CentOS 6 is based on 2.6.32. :) DAX was introduced
> > > in v4.0.
> >
> > Hum, can you post 'dumpe2fs -h /dev/pmem0' output from that system when the
> > md5sum fails? Because the only idea I have is that mkfs.ext4 in CentOS 6
> > creates the filesystem with a different set of features than more recent
> > e2fsprogs and so we hit some untested path...
>
> Sure, here's the output:
>
> # dumpe2fs -h /dev/pmem0
> dumpe2fs 1.41.12 (17-May-2010)
> Filesystem volume name: <none>
> Last mounted on: /mnt/pmem
> Filesystem UUID: 4cd8a836-cc54-4c59-ae0a-4a26bab0f8bc
> Filesystem magic number: 0xEF53
> Filesystem revision #: 1 (dynamic)
> Filesystem features: has_journal ext_attr resize_inode dir_index filetype
> needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg
> dir_nlink extra_isize
> Filesystem flags: signed_directory_hash
> Default mount options: (none)
> Filesystem state: clean
> Errors behavior: Continue
> Filesystem OS type: Linux
> Inode count: 1048576
> Block count: 4194304
> Reserved block count: 209715
> Free blocks: 4084463
> Free inodes: 1048565
> First block: 0
> Block size: 4096
> Fragment size: 4096
> Reserved GDT blocks: 1023
> Blocks per group: 32768
> Fragments per group: 32768
> Inodes per group: 8192
> Inode blocks per group: 512
> RAID stride: 1
> Flex block group size: 16
> Filesystem created: Thu Sep 8 14:45:31 2016
> Last mount time: Thu Sep 8 14:45:39 2016
> Last write time: Thu Sep 8 14:45:39 2016
> Mount count: 1
> Maximum mount count: 21
> Last checked: Thu Sep 8 14:45:31 2016
> Check interval: 15552000 (6 months)
> Next check after: Tue Mar 7 13:45:31 2017
> Lifetime writes: 388 MB
> Reserved blocks uid: 0 (user root)
> Reserved blocks gid: 0 (group root)
> First inode: 11
> Inode size: 256
> Required extra isize: 28
> Desired extra isize: 28
> Journal inode: 8
> Default directory hash: half_md4
> Directory Hash Seed: 19cad581-c46a-4212-bfa0-d527ff55db49
> Journal backup: inode blocks
> Journal features: (none)
> Journal size: 128M
> Journal length: 32768
> Journal sequence: 0x00000002
> Journal start: 1

Hum, nothing unusual in there. I've tried reproducing on a local SLE11 SP3
machine (which is from about the same time) but everything works as
expected there. Shrug...

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR