Re: [PATCH] VM patch 3 for -ac7

From: Zlatko Calusic (zlatko@iskon.hr)
Date: Sun Jun 04 2000 - 12:46:27 EST


Hi, Rik!

I tested all versions of your autotune patch (1-3) and am mostly
satisfied with the direction of the development. But still, I have
some objections and lots of questions. :)

First, something that is bothering me for a long time now (as 2.3.42
gets more far away timewise, and I have chosen that kernel version to
represent code that doesn't exhibit this particular bad behaviour):

Bulk I/O is performing terribly. Spurious swapping is killing us as we
read big chunks of data from disk. Ext2 optimizations and
anti-fragmentation code are now officialy obsolete, because they never
have a chance to come in effect. For example, check the following
chunk of "vmstat 1" output:

 0 0 0 6976 85140 232 4772 0 0 0 0 101 469 0 0 99
 0 1 0 6976 74532 244 15284 0 0 2638 7 290 802 1 4 94
 1 0 0 6976 59028 260 30772 0 0 3876 0 347 892 0 5 94
 0 1 0 6976 43012 276 46772 0 0 4004 0 356 779 0 5 95
 1 0 0 6976 26964 292 62900 0 0 4036 0 355 918 0 6 93
 0 1 0 6976 10852 308 78900 0 0 4004 0 355 931 0 5 94
   procs memory swap io system cpu
 r b w swpd free buff cache si so bi bo in cs us sy id
 2 0 0 7304 3128 184 89120 0 56 2978 26 305 780 1 14 85
 1 0 1 8084 2916 156 90320 0 220 2659 55 306 764 0 18 82
 0 2 0 9448 2112 168 92236 104 312 1873 78 281 790 0 11 88
 0 2 0 9916 2656 180 92016 264 212 795 53 199 465 0 4 96
 0 1 1 10340 2956 192 92024 0 288 2175 72 268 727 1 10 89
 0 2 0 10460 1936 204 92928 24 308 2588 77 296 804 1 6 93
 0 1 0 10772 2028 208 93080 16 456 1706 114 252 648 0 8 92
 1 0 1 10824 2900 204 92232 0 556 2402 139 298 784 0 5 94
 0 2 0 10868 2036 192 93124 24 140 2767 35 301 844 0 9 91
 0 2 0 11080 1944 192 93460 16 104 2526 26 286 836 0 6 94
 0 1 0 11620 2604 192 93220 4 88 2553 22 277 760 0 10 90
 0 1 0 11816 2164 196 93844 0 264 2620 66 292 792 0 9 91
 0 2 0 12084 1840 204 94320 80 196 1416 49 232 567 0 5 95
 0 1 0 12084 1708 216 94352 240 0 1467 0 219 676 0 1 98

At time T (top of the output), I started reading a big file from the
4k ext2 FS. The machine is completely idle, and as you can see has
lots of memory free. Before the memory gets filled (first few lines),
you can also see that data is coming at a 16MB/sec pace (bi ~ 4000),
which is _exactly_ the available (and expected) bandwidth.

And *then* we get in the trouble. VM kicks in and starts to swap in
an' out at will. Disk heads starts thrashing with sounds similar to
the ones heard when running netscape on a 16MB machine. Of course, the
reading speed drops drastically, and in the end we finish 10 seconds
later than we expected. I/O bandwidth is effectively halved, and why?
Because we enlarged page cache from completely satisfying 90MB!!! to
95MB (by 5%!), and to get that pissy 5MB we were swapping out as mad,
then processes started recolecting their pages back from the disk,
then all over again...

Now the question is: is such behaviour as expected or will that get
fixed before the final 2.4.0?

I'm worried that we are going to release the ultimate swapping machine
and say to people: here is the new an' great stable kernel! Watch it
swap and never stop. :)

What especially bother me is that nobody sees the problem. Everybody
is talking about better and better kernel, how things are getting
stable and response time is getting better, but I see new releases
getting worse and worse with performance going down the drain. Tell me
that I'm an idiot, that system is supposed to swap all the time and
then I'll maybe stop bitching. But not before. :)

Second thing: Around two years before, IIRC, Linus and people on this
group decided that we don't need page aging, that it only kills
performance and thus code is removed, not to be seen again. I wasn't
so sure then it was such a good idea, but when Andrea's lru page
management got in, I become very satisfied with our page replacement
policies.

Obviously, with zoned memory we got in the trouble once again and now,
as a solution you're getting page aging in the kernel again. Could you
tell us what are you're reasons? What has changed in the meantime, so
that we haven't needed page aging two years before, and now we need
it?

I didn't find any improvement (as can be seen from the vmstat output
above). Yes, I'm well aware of what are you _trying_ to achieve, but
in reality, you've just added lots of logic to the kernel and
accomplished nothing. Not to say we're back to 2.1.something and
history is repeating. :(

Gratuitous adding of untested code this far in the development doesn't
look like a very good idea to me.

In the end, I hope nobody sees this rather long complaint as an
attack, but rather as a call to a debate of how we could improve the
kernel and hopefully get us 2.4.0 out sooner. 2.4.0 we'll be proud of.

Regards,

-- 
Zlatko

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Jun 07 2000 - 21:00:19 EST