Re: [PATCH v2 0/9] re-enable DAX PMD support

From: Kani, Toshimitsu
Date: Wed Aug 31 2016 - 16:20:59 EST


On Tue, 2016-08-30 at 17:01 -0600, Ross Zwisler wrote:
> On Tue, Aug 23, 2016 at 04:04:10PM -0600, Ross Zwisler wrote:
> >
> > DAX PMDs have been disabled since Jan Kara introduced DAX radix
> > tree based locking.ÂÂThis series allows DAX PMDs to participate in
> > the DAX radix tree based locking scheme so that they can be re-
> > enabled.
> >
> > Changes since v1:
> > Â- PMD entry locking is now done based on the starting offset of
> > the PMD entry, rather than on the radix tree slot which was
> > unreliable. (Jan)
> > Â- Fixed the one issue I could find with hole punch.ÂÂAs far as I
> > can tell hole punch now works correctly for both PMD and PTE DAX
> > entries, 4k zero pages and huge zero pages.
> > Â- Fixed the way that ext2 returns the size of holes in
> > ext2_get_block(). (Jan)
> > Â- Made the 'wait_table' global variable static in respnse to a
> > sparse warning.
> > Â- Fixed some more inconsitent usage between the names 'ret' and
> > 'entry' for radix tree entry variables.
> >
> > Ross Zwisler (9):
> > Â ext4: allow DAX writeback for hole punch
> > Â ext2: tell DAX the size of allocation holes
> > Â ext4: tell DAX the size of allocation holes
> > Â dax: remove buffer_size_valid()
> > Â dax: make 'wait_table' global variable static
> > Â dax: consistent variable naming for DAX entries
> > Â dax: coordinate locking for offsets in PMD range
> > Â dax: re-enable DAX PMD support
> > Â dax: remove "depends on BROKEN" from FS_DAX_PMD
> >
> > Âfs/KconfigÂÂÂÂÂÂÂÂÂÂ|ÂÂÂ1 -
> > Âfs/dax.cÂÂÂÂÂÂÂÂÂÂÂÂ| 297 +++++++++++++++++++++++++++++-----------
> > ------------
> > Âfs/ext2/inode.cÂÂÂÂÂ|ÂÂÂ3 +
> > Âfs/ext4/inode.cÂÂÂÂÂ|ÂÂÂ7 +-
> > Âinclude/linux/dax.h |ÂÂ29 ++++-
> > Âmm/filemap.cÂÂÂÂÂÂÂÂ|ÂÂÂ6 +-
> > Â6 files changed, 201 insertions(+), 142 deletions(-)
> >
> > --Â
> > 2.9.0
>
> Ping on this series?ÂÂAny objections or comments?

Hi Ross,

I am seeing a major performance loss in fio mmap test with this patch-
set applied. ÂThis happens with or without my patches [1] applied on
top of yours. ÂWithout my patches,Âdax_pmd_fault() falls back to the
pte handler since an mmap'ed address is not 2MB-aligned.

I have attached three test results.
Âo rc4.log - 4.8.0-rc4 (base)
Âo non-pmd.log - 4.8.0-rc4 + your patchset (fall back to pte)
Âo pmd.log - 4.8.0-rc4 + your patchset + my patchset (use pmd maps)

My test steps are as follows.

mkfs.ext4 -O bigalloc -C 2M /dev/pmem0
mount -o dax /dev/pmem0 /mnt/pmem0
numactl --preferred block:pmem0 --cpunodebind block:pmem0 fio test.fio

"test.fio"
---
[global]
bs=4k
size=2G
directory=/mnt/pmem0
ioengine=mmap
[randrw]
rw=randrw
---

Can you please take a look?
Thanks,
-Toshi

[1]Âhttps://lkml.org/lkml/2016/8/29/560




randrw: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=mmap, iodepth=1
fio-2.6
Starting 1 process
randrw: Laying out IO file(s) (1 file(s) / 2048MB)

randrw: (groupid=0, jobs=1): err= 0: pid=12656: Wed Aug 31 18:14:06 2016
read : io=1024.7MB, bw=3076.4KB/s, iops=769, runt=341062msec
clat (usec): min=415, max=1703, avg=509.78, stdev=37.40
lat (usec): min=415, max=1703, avg=509.81, stdev=37.40
clat percentiles (usec):
| 1.00th=[ 482], 5.00th=[ 498], 10.00th=[ 498], 20.00th=[ 498],
| 30.00th=[ 502], 40.00th=[ 502], 50.00th=[ 502], 60.00th=[ 502],
| 70.00th=[ 502], 80.00th=[ 506], 90.00th=[ 524], 95.00th=[ 540],
| 99.00th=[ 724], 99.50th=[ 732], 99.90th=[ 748], 99.95th=[ 860],
| 99.99th=[ 900]
bw (KB /s): min= 2688, max= 3552, per=100.00%, avg=3078.69, stdev=143.84
write: io=1023.4MB, bw=3072.6KB/s, iops=768, runt=341062msec
clat (usec): min=683, max=1955, avg=788.99, stdev=45.83
lat (usec): min=683, max=1955, avg=789.04, stdev=45.84
clat percentiles (usec):
| 1.00th=[ 756], 5.00th=[ 772], 10.00th=[ 772], 20.00th=[ 772],
| 30.00th=[ 772], 40.00th=[ 780], 50.00th=[ 780], 60.00th=[ 780],
| 70.00th=[ 780], 80.00th=[ 788], 90.00th=[ 812], 95.00th=[ 828],
| 99.00th=[ 1004], 99.50th=[ 1012], 99.90th=[ 1128], 99.95th=[ 1144],
| 99.99th=[ 1208]
bw (KB /s): min= 2752, max= 3552, per=100.00%, avg=3074.60, stdev=96.62
lat (usec) : 500=12.55%, 750=37.73%, 1000=48.96%
lat (msec) : 2=0.76%
cpu : usr=99.96%, sys=0.01%, ctx=32870, majf=0, minf=3014
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=262309/w=261979/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
READ: io=1024.7MB, aggrb=3076KB/s, minb=3076KB/s, maxb=3076KB/s, mint=341062msec, maxt=341062msec
WRITE: io=1023.4MB, aggrb=3072KB/s, minb=3072KB/s, maxb=3072KB/s, mint=341062msec, maxt=341062msec

Disk stats (read/write):
pmem0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
randrw: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=mmap, iodepth=1
fio-2.6
Starting 1 process
randrw: Laying out IO file(s) (1 file(s) / 2048MB)

randrw: (groupid=0, jobs=1): err= 0: pid=19521: Wed Aug 31 17:50:39 2016
read : io=1024.7MB, bw=3034.5KB/s, iops=758, runt=345780msec
clat (usec): min=492, max=1359, avg=517.20, stdev=55.87
lat (usec): min=492, max=1359, avg=517.23, stdev=55.87
clat percentiles (usec):
| 1.00th=[ 498], 5.00th=[ 498], 10.00th=[ 498], 20.00th=[ 498],
| 30.00th=[ 502], 40.00th=[ 502], 50.00th=[ 502], 60.00th=[ 502],
| 70.00th=[ 502], 80.00th=[ 506], 90.00th=[ 524], 95.00th=[ 708],
| 99.00th=[ 740], 99.50th=[ 756], 99.90th=[ 900], 99.95th=[ 908],
| 99.99th=[ 1048]
bw (KB /s): min= 2600, max= 3448, per=100.00%, avg=3036.52, stdev=141.59
write: io=1023.4MB, bw=3030.6KB/s, iops=757, runt=345780msec
clat (usec): min=765, max=1788, avg=799.46, stdev=67.19
lat (usec): min=766, max=1788, avg=799.50, stdev=67.20
clat percentiles (usec):
| 1.00th=[ 772], 5.00th=[ 772], 10.00th=[ 772], 20.00th=[ 772],
| 30.00th=[ 772], 40.00th=[ 780], 50.00th=[ 780], 60.00th=[ 780],
| 70.00th=[ 780], 80.00th=[ 788], 90.00th=[ 820], 95.00th=[ 996],
| 99.00th=[ 1020], 99.50th=[ 1144], 99.90th=[ 1176], 99.95th=[ 1208],
| 99.99th=[ 1320]
bw (KB /s): min= 2704, max= 3328, per=100.00%, avg=3032.56, stdev=93.00
lat (usec) : 500=10.66%, 750=39.06%, 1000=48.19%
lat (msec) : 2=2.08%
cpu : usr=99.96%, sys=0.00%, ctx=32513, majf=0, minf=3012
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=262309/w=261979/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
READ: io=1024.7MB, aggrb=3034KB/s, minb=3034KB/s, maxb=3034KB/s, mint=345780msec, maxt=345780msec
WRITE: io=1023.4MB, aggrb=3030KB/s, minb=3030KB/s, maxb=3030KB/s, mint=345780msec, maxt=345780msec

Disk stats (read/write):
pmem0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
randrw: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=mmap, iodepth=1
fio-2.6
Starting 1 process
randrw: Laying out IO file(s) (1 file(s) / 2048MB)

randrw: (groupid=0, jobs=1): err= 0: pid=12678: Wed Aug 31 19:59:45 2016
read : io=1024.7MB, bw=775489KB/s, iops=193872, runt= 1353msec
clat (usec): min=1, max=297, avg= 1.67, stdev= 2.92
lat (usec): min=1, max=297, avg= 1.70, stdev= 2.96
clat percentiles (usec):
| 1.00th=[ 1], 5.00th=[ 1], 10.00th=[ 1], 20.00th=[ 1],
| 30.00th=[ 1], 40.00th=[ 1], 50.00th=[ 2], 60.00th=[ 2],
| 70.00th=[ 2], 80.00th=[ 2], 90.00th=[ 2], 95.00th=[ 2],
| 99.00th=[ 3], 99.50th=[ 4], 99.90th=[ 12], 99.95th=[ 12],
| 99.99th=[ 189]
bw (KB /s): min=736608, max=792296, per=98.58%, avg=764452.00, stdev=39377.36
write: io=1023.4MB, bw=774513KB/s, iops=193628, runt= 1353msec
clat (usec): min=2, max=235, avg= 2.66, stdev= 3.59
lat (usec): min=2, max=235, avg= 2.70, stdev= 3.61
clat percentiles (usec):
| 1.00th=[ 2], 5.00th=[ 2], 10.00th=[ 2], 20.00th=[ 2],
| 30.00th=[ 2], 40.00th=[ 2], 50.00th=[ 3], 60.00th=[ 3],
| 70.00th=[ 3], 80.00th=[ 3], 90.00th=[ 3], 95.00th=[ 3],
| 99.00th=[ 4], 99.50th=[ 6], 99.90th=[ 13], 99.95th=[ 14],
| 99.99th=[ 193]
bw (KB /s): min=736288, max=789440, per=98.50%, avg=762864.00, stdev=37584.14
lat (usec) : 2=20.18%, 4=78.23%, 10=1.40%, 20=0.16%, 50=0.01%
lat (usec) : 250=0.03%, 500=0.01%
cpu : usr=46.82%, sys=53.03%, ctx=135, majf=0, minf=786279
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=262309/w=261979/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
READ: io=1024.7MB, aggrb=775488KB/s, minb=775488KB/s, maxb=775488KB/s, mint=1353msec, maxt=1353msec
WRITE: io=1023.4MB, aggrb=774512KB/s, minb=774512KB/s, maxb=774512KB/s, mint=1353msec, maxt=1353msec

Disk stats (read/write):
pmem0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%