Re: Linux 2.6.29

From: Theodore Tso
Date: Wed Mar 25 2009 - 14:30:48 EST


On Tue, Mar 24, 2009 at 12:00:41PM -0700, David Rees wrote:
> >>> Consensus seems to be something with large memory machines, lots of dirty
> >>> pages and a long writeout time due to ext3.
> >>
> >> All filesystems seem to suffer from this issue to some degree.  I
> >> posted to the list earlier trying to see if there was anything that
> >> could be done to help my specific case.  I've got a system where if
> >> someone starts writing out a large file, it kills client NFS writes.
> >> Makes the system unusable:
> >> http://marc.info/?l=linux-kernel&m=123732127919368&w=2
> >
> > Yes, I've hit 120s+ penalties just by saving a file in vim.
>
> Yeah, your disks aren't keeping up and/or data isn't being written out
> efficiently.

Agreed; we probably will need to get some blktrace outputs to see what
is going on.

> >> Only workaround I've found is to reduce dirty_background_ratio and
> >> dirty_ratio to tiny levels.  Or throw good SSDs and/or a fast RAID
> >> array at it so that large writes complete faster.  Have you tried the
> >> new vm_dirty_bytes in 2.6.29?
> >
> > No.. What would you suggest to be a reasonable setting for that?
>
> Look at whatever is there by default and try cutting them in half to start.

I'm beginning to think that using a "ratio" may be the wrong way to
go. We probably need to add an optional dirty_max_megabytes field
where we start pushing dirty blocks out when the number of dirty
blocks exceeds either the dirty_ratio or the dirty_max_megabytes,
which ever comes first. The problem is that 5% might make sense for a
small machine with only 1G of memory, but it might not make so much
sense if you have 32G of memory.

But the other problem is whether we are issuing the writes in an
efficient way, and that means we need to see what is going on at the
blktrace level as a starting point, and maybe we'll need some
custom-designed trace outputs to see what is going on at the
inode/logical block level, not just at the physical block level.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/