Re: [RFC PATCH] mm, compaction: allow compaction for GFP_NOFS requests

From: Michal Hocko
Date: Thu Oct 13 2016 - 06:05:04 EST


On Thu 13-10-16 11:29:24, Dave Chinner wrote:
> On Fri, Oct 07, 2016 at 03:18:14PM +0200, Michal Hocko wrote:
[...]
> > Unpatched kernel:
> > # Version 3.3, 16 thread(s) starting at Fri Oct 7 09:55:05 2016
> > # Sync method: NO SYNC: Test does not issue sync() or fsync() calls.
> > # Directories: Time based hash between directories across 10000 subdirectories with 180 seconds per subdirectory.
> > # File names: 40 bytes long, (16 initial bytes of time stamp with 24 random bytes at end of name)
> > # Files info: size 0 bytes, written with an IO size of 16384 bytes per write
> > # App overhead is time in microseconds spent in the test not doing file writing related system calls.
> > #
> > FSUse% Count Size Files/sec App Overhead
> > 1 1600000 0 4300.1 20745838
> > 3 3200000 0 4239.9 23849857
> > 5 4800000 0 4243.4 25939543
> > 6 6400000 0 4248.4 19514050
> > 8 8000000 0 4262.1 20796169
> > 9 9600000 0 4257.6 21288675
> > 11 11200000 0 4259.7 19375120
> > 13 12800000 0 4220.7 22734141
> > 14 14400000 0 4238.5 31936458
> > 16 16000000 0 4231.5 23409901
> > 18 17600000 0 4045.3 23577700
> > 19 19200000 0 2783.4 58299526
> > 21 20800000 0 2678.2 40616302
> > 23 22400000 0 2693.5 83973996
> > Ctrl+C because it just took too long.
>
> Try running it on a larger filesystem, or configure the fs with more
> AGs and a larger log (i.e. mkfs.xfs -f -dagcount=24 -l size=512m
> <dev>). That will speed up modifications and increase concurrency.
> This test should be able to run 5-10x faster than this (it
> sustains 55,000 files/s @ 300MB/s write on my test fs on a cheap
> SSD).

Will add more memory to the machine. Will report back on that.

> > while it doesn't seem to drop the Files/sec numbers starting with
> > Count=19200000. I also see only a single
> >
> > [ 3063.815003] XFS: fs_mark(3272) possible memory allocation deadlock size 65624 in kmem_alloc (mode:0x2408240)
>
> Remember that this is emitted only after /100/ consecutive
> allocation failures. So the fact it is still being emitted means
> that the situation is not drastically better....

yes, but we also should consider that with this particular workload
which doesn't have a lot of anonymous memory there is simply not all
that much to migrate so we eventually have to wait for the reclaim
to free up fs bound memory. This patch should put some relief but it
is not a general remedy.

> > Unpatched kernel
> > all orders
> > begin:44.718798 end:5774.618736 allocs:15019288
> > order > 0
> > begin:44.718798 end:5773.587195 allocs:10438610
> >
> > Patched kernel
> > all orders
> > begin:64.612804 end:5794.193619 allocs:16110081 [107.2%]
> > order > 0
> > begin:64.612804 end:5794.193619 allocs:11741492 [112.5%]
> >
> > which would suggest that diving into the compaction rather than backing
> > off and waiting for kcompactd to make the work for us was indeed a
> > better strategy and helped the throughput.
>
> Well, without a success/failure ratio being calculated it's hard to
> tell what improvement it made. Did it increase the success rate, or
> reduce failure latency so retries happened faster?

I have just noticed that the tracepoint also reports allocation failures
(page==(null) and pfn==0) so I actually can calculate that. Note that
only order > 3 fail with the current page allocator so I have filtered
only those

Unpatched
begin:44.718798 end:5773.587195 allocs:6162244 fail:145

Patched
begin:64.612804 end:5794.193574 allocs:6869496 fail:104

So the success rate is slightly higher but this is negligible but we
seem to manage perform ~10% more allocations so I assume this helped the
throughput and in turn recycle memory better.
--
Michal Hocko
SUSE Labs