Re: [PATCH v3 1/2] writeback: add dirty_background_centisecs per bdi variable

From: Namjae Jeon
Date: Tue Sep 25 2012 - 02:54:00 EST


2012/9/25, Namjae Jeon <linkinjeon@xxxxxxxxx>:
> 2012/9/25, Dave Chinner <david@xxxxxxxxxxxxx>:
>> On Thu, Sep 20, 2012 at 04:44:22PM +0800, Fengguang Wu wrote:
>>> [ CC FS and MM lists ]
>>>
>>> Patch looks good to me, however we need to be careful because it's
>>> introducing a new interface. So it's desirable to get some acks from
>>> the FS/MM developers.
>>>
>>> Thanks,
>>> Fengguang
>>>
>>> On Sun, Sep 16, 2012 at 08:25:42AM -0400, Namjae Jeon wrote:
>>> > From: Namjae Jeon <namjae.jeon@xxxxxxxxxxx>
>>> >
>>> > This patch is based on suggestion by Wu Fengguang:
>>> > https://lkml.org/lkml/2011/8/19/19
>>> >
>>> > kernel has mechanism to do writeback as per dirty_ratio and
>>> > dirty_background
>>> > ratio. It also maintains per task dirty rate limit to keep balance of
>>> > dirty pages at any given instance by doing bdi bandwidth estimation.
>>> >
>>> > Kernel also has max_ratio/min_ratio tunables to specify percentage of
>>> > writecache to control per bdi dirty limits and task throttling.
>>> >
>>> > However, there might be a usecase where user wants a per bdi writeback
>>> > tuning
>>> > parameter to flush dirty data once per bdi dirty data reach a
>>> > threshold
>>> > especially at NFS server.
>>> >
>>> > dirty_background_centisecs provides an interface where user can tune
>>> > background writeback start threshold using
>>> > /sys/block/sda/bdi/dirty_background_centisecs
>>> >
>>> > dirty_background_centisecs is used alongwith average bdi write
>>> > bandwidth
>>> > estimation to start background writeback.
>>> >
>>> > One of the use case to demonstrate the patch functionality can be
>>> > on NFS setup:-
>>> > We have a NFS setup with ethernet line of 100Mbps, while the USB
>>> > disk is attached to server, which has a local speed of 25MBps. Server
>>> > and client both are arm target boards.
>>> >
>>> > Now if we perform a write operation over NFS (client to server), as
>>> > per the network speed, data can travel at max speed of 100Mbps. But
>>> > if we check the default write speed of USB hdd over NFS it comes
>>> > around to 8MB/sec, far below the speed of network.
>>> >
>>> > Reason being is as per the NFS logic, during write operation,
>>> > initially
>>> > pages are dirtied on NFS client side, then after reaching the dirty
>>> > threshold/writeback limit (or in case of sync) data is actually sent
>>> > to NFS server (so now again pages are dirtied on server side). This
>>> > will be done in COMMIT call from client to server i.e if 100MB of data
>>> > is dirtied and sent then it will take minimum 100MB/10Mbps ~ 8-9
>>> > seconds.
>>> >
>>> > After the data is received, now it will take approx 100/25 ~4 Seconds
>>> > to
>>> > write the data to USB Hdd on server side. Hence making the overall
>>> > time
>>> > to write this much of data ~12 seconds, which in practically comes out
>>> > to
>>> > be near 7 to 8MB/second. After this a COMMIT response will be sent to
>>> > NFS
>>> > client.
>>> >
>>> > However we may improve this write performace by making the use of NFS
>>> > server idle time i.e while data is being received from the client,
>>> > simultaneously initiate the writeback thread on server side. So
>>> > instead
>>> > of waiting for the complete data to come and then start the writeback,
>>> > we can work in parallel while the network is still busy in receiving
>>> > the
>>> > data. Hence in this way overall performace will be improved.
>>> >
>>> > If we tune dirty_background_centisecs, we can see there
>>> > is increase in the performace and it comes out to be ~ 11MB/seconds.
>>> > Results are:-
>>> >
>>> > Write test(create a 1 GB file) result at 'NFS client' after changing
>>> > /sys/block/sda/bdi/dirty_background_centisecs
>>> > on *** NFS Server only - not on NFS Client ****
>>
>
> Hi. Dave.
>
>> What is the configuration of the client and server? How much RAM,
>> what their dirty_* parameters are set to, network speed, server disk
>> speed for local sequential IO, etc?
> these results are on ARM, 512MB RAM and XFS over NFS with default
> writeback settings(only our writeback setting - dirty_backâground_cen
> tisecs changed at nfs server only). Network speed is ~100MB/sec and
Sorry, there is typo:)
^^100Mb/sec
> local disk speed is ~25MB/sec.
>
>>
>>> > ---------------------------------------------------------------------
>>> > |WRITE Test with various 'dirty_background_centisecs' at NFS Server |
>>> > ---------------------------------------------------------------------
>>> > | | default = 0 | 300 centisec| 200 centisec| 100 centisec |
>>> > ---------------------------------------------------------------------
>>> > |RecSize | WriteSpeed | WriteSpeed | WriteSpeed | WriteSpeed |
>>> > ---------------------------------------------------------------------
>>> > |10485760 | 8.44MB/sec | 8.60MB/sec | 9.30MB/sec | 10.27MB/sec |
>>> > | 1048576 | 8.48MB/sec | 8.87MB/sec | 9.31MB/sec | 10.34MB/sec |
>>> > | 524288 | 8.37MB/sec | 8.42MB/sec | 9.84MB/sec | 10.47MB/sec |
>>> > | 262144 | 8.16MB/sec | 8.51MB/sec | 9.52MB/sec | 10.62MB/sec |
>>> > | 131072 | 8.48MB/sec | 8.81MB/sec | 9.42MB/sec | 10.55MB/sec |
>>> > | 65536 | 8.38MB/sec | 9.09MB/sec | 9.76MB/sec | 10.53MB/sec |
>>> > | 32768 | 8.65MB/sec | 9.00MB/sec | 9.57MB/sec | 10.54MB/sec |
>>> > | 16384 | 8.27MB/sec | 8.80MB/sec | 9.39MB/sec | 10.43MB/sec |
>>> > | 8192 | 8.52MB/sec | 8.70MB/sec | 9.40MB/sec | 10.50MB/sec |
>>> > | 4096 | 8.20MB/sec | 8.63MB/sec | 9.80MB/sec | 10.35MB/sec |
>>> > ---------------------------------------------------------------------
>>
>> While this set of numbers looks good, it's a very limited in scope.
>> I can't evaluate whether the change is worthwhile or not from this
>> test. If I was writing this patch, the questions I'd be seeking to
>> answer before proposing it for inclusion are as follows....
>>
>> 1. what's the comparison in performance to typical NFS
>> server writeback parameter tuning? i.e. dirty_background_ratio=5,
>> dirty_ratio=10, dirty_expire_centiseconds=1000,
>> dirty_writeback_centisecs=1? i.e. does this give change give any
>> benefit over the current common practice for configuring NFS
>> servers?
>>
>> 2. what happens when you have 10 clients all writing to the server
>> at once? Or a 100? NFS servers rarely have a single writer to a
>> single file at a time, so what impact does this change have on
>> multiple concurrent file write performance from multiple clients?
>>
>> 3. Following on from the multiple client test, what difference does it
>> make to file fragmentation rates? Writing more frequently means
>> smaller allocations and writes, and that tends to lead to higher
>> fragmentation rates, especially when multiple files are being
>> written concurrently. Higher fragmentation also means lower
>> performance over time as fragmentation accelerates filesystem aging
>> effects on performance. IOWs, it may be faster when new, but it
>> will be slower 3 months down the track and that's a bad tradeoff to
>> make.
>>
>> 4. What happens for higher bandwidth network links? e.g. gigE or
>> 10gigE? Are the improvements still there? Or does it cause
>> regressions at higher speeds? I'm especially interested in what
>> happens to multiple writers at higher network speeds, because that's
>> a key performance metric used to measure enterprise level NFS
>> servers.
>>
>> 5. Are the improvements consistent across different filesystem
>> types? We've had writeback changes in the past cause improvements
>> on one filesystem but significant regressions on others. I'd
>> suggest that you need to present results for ext4, XFS and btrfs so
>> that we have a decent idea of what we can expect from the change to
>> the generic code.
>>
>> Yeah, I'm asking a lot of questions. That's because the generic
>> writeback code is extremely important to performance and the impact
>> of a change cannot be evaluated from a single test.
> Yes, I agree.
> I will share patch behavior in gigabit Ethernet, different
> filesystems(e.g. ext4, xfs and btrfs) and multiple NFS clients setup.
>
> Thanks.
>>
>> Cheers,
>>
>> Dave.
>> --
>> Dave Chinner
>> david@xxxxxxxxxxxxx
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/