Re: getdents - ext4 vs btrfs performance

From: Chris Mason
Date: Mon Mar 05 2012 - 19:37:21 EST


On Mon, Mar 05, 2012 at 12:32:45PM +0100, Jacek Luczak wrote:
> 2012/3/4 Jacek Luczak <difrost.kernel@xxxxxxxxx>:
> > 2012/3/3 Jacek Luczak <difrost.kernel@xxxxxxxxx>:
> >> 2012/3/2 Chris Mason <chris.mason@xxxxxxxxxx>:
> >>> On Fri, Mar 02, 2012 at 03:16:12PM +0100, Jacek Luczak wrote:
> >>>> 2012/3/2 Chris Mason <chris.mason@xxxxxxxxxx>:
> >>>> > On Fri, Mar 02, 2012 at 11:05:56AM +0100, Jacek Luczak wrote:
> >>>> >>
> >>>> >> I've took both on tests. The subject is acp and spd_readdir used with
> >>>> >> tar, all on ext4:
> >>>> >> 1) acp: http://91.234.146.107/~difrost/seekwatcher/acp_ext4.png
> >>>> >> 2) spd_readdir: http://91.234.146.107/~difrost/seekwatcher/tar_ext4_readir.png
> >>>> >> 3) both: http://91.234.146.107/~difrost/seekwatcher/acp_vs_spd_ext4.png
> >>>> >>
> >>>> >> The acp looks much better than spd_readdir but directory copy with
> >>>> >> spd_readdir decreased to 52m 39sec (30 min less).
> >>>> >
> >>>> > Do you have stats on how big these files are, and how fragmented they
> >>>> > are?  For acp and spd to give us this, I think something has gone wrong
> >>>> > at writeback time (creating individual fragmented files).
> >>>>
> >>>> How big? Which files?
> >>>
> >>> All the files you're reading ;)
> >>>
> >>> filefrag will tell you how many extents each file has, any file with
> >>> more than one extent is interesting.  (The ext4 crowd may have better
> >>> suggestions on measuring fragmentation).
> >>>
> >>> Since you mention this is a compile farm, I'm guessing there are a bunch
> >>> of .o files created by parallel builds.  There are a lot of chances for
> >>> delalloc and the kernel writeback code to do the wrong thing here.
> >>>
> >>
> > [Most of files are B and K size]
> >>
> >> All files scanned: 1978149
> >> Files fragmented: 313 (0.015%) where 11 have 3+ extents
> >> Total size of fragmented files: 7GB (~13% of dir size)

Ok, so I don't have a lot of great new ideas. My guess is that inode
order and disk order for the blocks aren't matching up. You can confirm
this with:

acp -b some_dir

You can also try telling acp to make a bigger read ahead window:

acp -s 4096 -r 128 some_dir

You can tell acp to scan all the files in the directory tree first
(warning, this might use a good chunk of ram)

acp -w some_dir

and you can combine all of these together None of the above
will actually help in your workload, but it'll help narrow down what is
actually seeky on disk.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/