Re: Performance regression in scsi sequential throughput (iozone)due to "e084b - page-allocator: preserve PFN ordering when __GFP_COLD isset"

From: Christian Ehrhardt
Date: Wed Mar 03 2010 - 01:51:47 EST

Mel Gorman wrote:
On Tue, Mar 02, 2010 at 10:18:27PM +1100, Nick Piggin wrote:
On Tue, Mar 02, 2010 at 11:01:50AM +0000, Mel Gorman wrote:
On Tue, Mar 02, 2010 at 09:36:46PM +1100, Nick Piggin wrote:
On Tue, Mar 02, 2010 at 10:04:02AM +0000, Mel Gorman wrote:
On Tue, Mar 02, 2010 at 05:52:25PM +1100, Nick Piggin wrote:
We could check further in the
slow-path but I bet it'd be very rare that the logic would be triggered. For
a process to enter the FIFO due to waiters that were not yet woken up, the
system would have to be a) under heavy memory pressure b) reclaim taking such
a long time that check_zone_pressure() is not being called in time and c)
a process exiting or otherwise freeing memory such that the watermarks are
cleared without reclaim being involved.
I don't think it would be too rare. Things can get freed up and
other allocations come in while reclaim is happening. But anyway
the nasty thing about the "rare" events is that they do add a
rare source of unexpected latency or starvation.

If processes are asleep on the waitqueue, reclaim must be active (by kswapd
if nothing else). If pages are getting freed above the necessary watermark,
then the processes will be woken up when the current shrink_zone() finished
unless unfair processes are keeping the zone below watermarks. But unless
reclaim is taking an extraordinary long length of time, there would be little
difference between waking the queue in the free path and waking it in the
reclaim path.
Reclaim can take quite a while, yes.

On one Hand the question if "waiter A is not yet awoken after shrink_zone(), but greedy B just drained pages under the water mark again" is good to make these new waitqueue approach as good as it can be.
On the other Hand you can see it this way - it is now at least waiting for the right thing "the related watermark being restored", which will in any way be better than waiting for writes who might or might not free enough pages or as in my case might not even be there :-)
And additionally its timing even if it could be a bit racy as you described will be much better than it is at the moment.


Grüsse / regards, Christian Ehrhardt
IBM Linux Technology Center, System z Linux Performance
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at