Re: [PATCH 00/13] IO-less dirty throttling

From: Wu Fengguang
Date: Wed Nov 17 2010 - 21:50:48 EST


On Thu, Nov 18, 2010 at 09:59:00AM +0800, Andrew Morton wrote:
> On Thu, 18 Nov 2010 12:40:51 +1100 Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> >
> > There's no point
> > waking a dirtier if all they can do is write a single page before
> > they are throttled again - IO is most efficient when done in larger
> > batches...
>
> That assumes the process was about to do another write. That's
> reasonable on average, but a bit sad for interactive/rtprio tasks. At
> some stage those scheduler things should be brought into the equation.

The interactive/rtprio tasks are given 1/4 bonus in
global_dirty_limits(). So when there are lots of heavy dirtiers,
the interactive/rtprio tasks will get soft throttled at
(6~8)*bdi_bandwidth. We can increase that to (12~16)*bdi_bandwidth
or whatever.

> >
> > ...
> >
> > Yeah, sorry, should have posted them - I didn't because I snapped
> > the numbers before the run had finished. Without series:
> >
> > 373.19user 14940.49system 41:42.17elapsed 612%CPU (0avgtext+0avgdata 82560maxresident)k
> > 0inputs+0outputs (403major+2599763minor)pagefaults 0swaps
> >
> > With your series:
> >
> > 359.64user 5559.32system 40:53.23elapsed 241%CPU (0avgtext+0avgdata 82496maxresident)k
> > 0inputs+0outputs (312major+2598798minor)pagefaults 0swaps
> >
> > So the wall time with your series is lower, and system CPU time is
> > way down (as I've already noted) for this workload on XFS.
>
> How much of that benefit is an accounting artifact, moving work away
> from the calling process's CPU and into kernel threads?

The elapsed time won't cheat, and it's going down from 41:42 to 40:53.

For the CPU time, I have system wide numbers collected from iostat.
Citing from the changelog of the first patch:

- 1 dirtier case: the same
- 10 dirtiers case: CPU system time is reduced to 50%
- 100 dirtiers case: CPU system time is reduced to 10%, IO size and throughput increases by 10%

2.6.37-rc2 2.6.37-rc1-next-20101115+
---------------------------------------- ----------------------------------------
%system wkB/s avgrq-sz %system wkB/s avgrq-sz
100dd 30.916 37843.000 748.670 3.079 41654.853 822.322
100dd 30.501 37227.521 735.754 3.744 41531.725 820.360

10dd 39.442 47745.021 900.935 20.756 47951.702 901.006
10dd 39.204 47484.616 899.330 20.550 47970.093 900.247

1dd 13.046 57357.468 910.659 13.060 57632.715 909.212
1dd 12.896 56433.152 909.861 12.467 56294.440 909.644

Those are real CPU savings :)

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/