Re: xfs: very slow after mount, very slow at umount

From: Dave Chinner
Date: Fri Jan 28 2011 - 08:57:00 EST

Next message: Julia Lawall: "[PATCH 2/2] net/netlabel: Avoid call to genlmsg_cancel"
Previous message: Tejun Heo: "Re: [PATCHSET] x86: unify x86_32 and 64 NUMA init paths, take#5"
In reply to: david: "Re: xfs: very slow after mount, very slow at umount"
Next in thread: david: "Re: xfs: very slow after mount, very slow at umount"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Jan 27, 2011 at 06:09:58PM -0800, david@xxxxxxx wrote:
> On Thu, 27 Jan 2011, Stan Hoeppner wrote:
> >david@xxxxxxx put forth on 1/27/2011 2:11 PM:
> >
> >>how do I understand how to setup things on multi-disk systems? the documentation
> >>I've found online is not that helpful, and in some ways contradictory.
> >
> >Visit http://xfs.org There you will find:
> >
> >Users guide:
> >http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide//tmp/en-US/html/index.html
> >
> >File system structure:
> >http://xfs.org/docs/xfsdocs-xml-dev/XFS_Filesystem_Structure//tmp/en-US/html/index.html
> >
> >Training labs:
> >http://xfs.org/docs/xfsdocs-xml-dev/XFS_Labs/tmp/en-US/html/index.html
>
> thanks for the pointers.
>
> >>If there really are good rules for how to do this, it would be very helpful if
> >>you could just give mkfs.xfs the information about your system (this partition
> >>is on a 16 drive raid6 array) and have it do the right thing.
> >
> >If your disk array is built upon Linux mdraid, recent versions of mkfs.xfs will
> >read the parameters and automatically make the filesystem accordingly, properly.
> >
> >mxfs.fxs will not do this for PCIe/x hardware RAID arrays or external FC/iSCSI
> >based SAN arrays as there is no standard place to acquire the RAID configuration
> >information for such systems. For these you will need to configure mkfs.xfs
> >manually.
> >
> >At minimum you will want to specify stripe width (sw) which needs to match the
> >hardware stripe width. For RAID0 sw=[#of_disks]. For RAID 10, sw=[#disks/2].
> >For RAID5 sw=[#disks-1]. For RAID6 sw=[#disks-2].
> >
> >You'll want at minimum agcount=16 for striped hardware arrays. Depending on the
> >number and spindle speed of the disks, the total size of the array, the
> >characteristics of the RAID controller (big or small cache), you may want to
> >increase agcount. Experimentation may be required to find the optimum
> >parameters for a given hardware RAID array. Typically all other parameters may
> >be left at defaults.
>
> does this value change depending on the number of disks in the array?

Only depending on block device capacity. Once at the maximum AG size
(1TB), mkfs has to add more AGs. So once above 4TB for hardware RAID
LUNs and 16TB for md/dm devices, you will get an AG per TB of
storage by default.

As it is, the optimal number and size of AGs will depend on many
geometry factors as workload factors, such as the size of the luns,
the way they are striped, whether you are using linear concatenation
of luns or striping them or a combination of both, the amount of
allocation concurrency you require, etc. In these sorts of
situations, mkfs can only make a best guess - to do better you
really need someone proficient in the dark arts to configure the
storage and filesystem optimally.

> >Picking the perfect mkfs.xfs parameters for a hardware RAID array can be
> >somewhat of a black art, mainly because no two vendor arrays act or perform
> >identically.
>
> if mkfs.xfs can figure out how to do the 'right thing' for md raid
> arrays, can there be a mode where it asks the users for the same
> information that it gets from the kernel?

mkfs.xfs can get the information it needs directly from dm and md
devices. However, when hardware RAID luns present themselves to the
OS in an identical manner to single drives, how does mkfs tell the
difference between a 2TB hardware RAID lun made up of 30x73GB drives
and a single 2TB SATA drive? The person running mkfs should already
know this little detail....

> >Systems of a caliber requiring XFS should be thoroughly tested before going into
> >production. Testing _with your workload_ of multiple parameters should be
> >performed to identify those yielding best performance.
>
> <rant>
> the problem with this is that for large arrays, formatting the array
> and loading it with data can take a day or more, even before you
> start running the test. This is made even worse if you are scaling
> up an existing system a couple orders of magnatude, because you may
> not have the full workload available to you.

If your hardware procurement-to-production process doesn't include
testing performance of potential equipment on a representative
workload, then I'd say you have a process problem that we can't help
you solve....

> Saying that you should
> test out every option before going into production is a cop-out.

I never test every option. I know what the options do, so to decide
what to tweak (if anything) what I first need to know is how a
workload performs on a given storage layout with default options. I
need to have:

a) some idea of the expected performance of the workload
b) a baseline performance characterisation of the underlying
block devices
c) a set of baseline performance metrics from a
representative workload on a default filesystem
d) spent some time analysing the baseline metrics for
evidence of sub-optimal performance characteristics.

Once I have that information, I can suggest meaningful ways (if any)
to change the storage and filesystem configuration that may improve
the performance of the workload.

BTW, if you ask me how to optimise an ext4 filesystem for the same
workload, I'll tell you straight up that I have no idea and that you
should ask an ext4 expert....

> The better you can test it, the better off you are, but without
> knowing what the knobs do, just doing a test and twiddling the
> knobs to do another test isn't very useful.

Well, yes, that is precisely the reason you should use the defaults.
It's also the reason we have experts - they know what knob to
twiddle to fix specific problems. If you prefer to twiddle knobs
like Blind Freddy, then you should expect things to go wrong....

> If there is a way to
> set the knobs in the general ballpark,

Have you ever considered that this is exactly what mkfs does when
you use the defaults? And that this is the fundamental reason we
keep saying "use the defaults"?

> then you can test and see
> if the performance seems adaquate, if not you can try teaking one
> of the knobs a little bit and see if it helps or hurts. but if the
> knobs aren't even in the ballpark when you start, this doesn't
> help much.

The thread has now come full circle - you're ranting about not
knowing what knobs do or how to set reasonable values so you want to
twiddle random knobs them to see if they do anything as the basis of
your optimisation process. This is the exact process that lead to
the bug report that started this thread - a tweak-without-
understanding configuration leading to undesirable behavioural
characteristics from the filesystem.....

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Julia Lawall: "[PATCH 2/2] net/netlabel: Avoid call to genlmsg_cancel"
Previous message: Tejun Heo: "Re: [PATCHSET] x86: unify x86_32 and 64 NUMA init paths, take#5"
In reply to: david: "Re: xfs: very slow after mount, very slow at umount"
Next in thread: david: "Re: xfs: very slow after mount, very slow at umount"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]