Re: [Bug #14141] order 2 page allocation failures in iwlagn

From: Frans Pop
Date: Mon Oct 26 2009 - 17:06:34 EST


On Tuesday 20 October 2009, Mel Gorman wrote:
> I've attached a patch below that should allow us to cheat. When it's
> applied, it outputs who called congestion_wait(), how long the timeout
> was and how long it waited for. By comparing before and after sleep
> times, we should be able to see which of the callers has significantly
> changed and if it's something easily addressable.

The results from this look fairly interesting (although I may be a bad
judge as I don't really know what I'm looking at ;-).

I've tested with two kernels:
1) 2.6.31.1: 1 test run
2) 2.6.31.1 + congestion_wait() reverts: 2 test runs

The 1st kernel had the expected "freeze" while reading commits in gitk;
reading commits with the 2nd kernel was more fluent.
I did 2 runs with the 2nd kernel as the first run had a fairly long music
skip and more SKB errors than expected. The second run was fairly normal
with no music skips at all even though it had a few SKB errors.

Data for the tests:
1st kernel 2nd kernel 1 2nd kernel 2
end reading commits 1:15 1:00 0:55
"freeze" yes no no
branch data shown 1:55 1:15 1:10
system quiet 2:25 1:50 1:45
# SKB allocation errors 10 53 5

Note that the test is substantially faster with the 2nd kernel and that the
SKB errors don't really affect the duration of the test.


Attached a tarball with the kernel logs, both the full logs and a stripped
version with only the lines generated during the actual test.
Something like this will extract the debug data from the logs:
$ grep "delay " <file> | sed "s/^.*\] //"

Also attached a ODF spreadsheet with a summary of the data for all 3 tests.
I've dropped the congestion_wait and sync/rw= columns as they were always
the same (rw=1 for 1st kernel and sync=0 for 2nd kernel).
I've added a column "weighed delay" and totals for that column and the
count column.

My layman's observations are:
- without the revert 'background_writeout' is called a lot less frequently,
but when it's called it gets long delays
- without the revert you have 'wb_kupdate', which is relatively expensive
- with the revert 'shrink_list' is relatively expensive, although not
really in absolute terms

You people may want to look at exactly what happens directly around the SKB
allocation errors. I've only looked at totals.

Cheers,
FJP

Attachment: logs.tgz
Description: application/tgz

Attachment: results.ods
Description: application/vnd.oasis.opendocument.spreadsheet