maybe a buffers bug? - Re: NTFS module is buggy

From: Anton Altaparmakov (aia21@cam.ac.uk)
Date: Fri Jun 09 2000 - 10:37:37 EST


after running: find . -type f -exec md5sum {} \; on a NTFS partition,
_sometimes_ I get the following from bash:

        find: cannot fork: Cannot allocate memory

But this only happens _sometimes_, and is not happening at the same file
every time and sometimes the find completes without any errors. - I have a
4, 8 and 10Gb NTFS partitions and it doesn't happen on the 4Gb one but does
happen on both the others. - So far it seems to occur in files with very
long file names but this might be just pure coincidence, after all I have
only done it a couple of times and since it's not the same file every time
it doesn't seem to be as easy as a corrupt file.

Also, once it has happened it is very much easier to happen again. - I.e.
something is going wrong somewhere and the system thinks its out of memory!

I get messages of the form: "eth0: Memory squeeze, deferring packet", which
I have never seen before either, but I assume indicate out of memory as well.

However looking at free shows plenty of memory in buffers which could be
freed - which might be not happening of course?

Running "vmstat 1" concurrently with the "find..." shows that just when the
out of memory message appears the number of context switches goes up by
several orders of magnitude! - And it always seems to happen after lots of
block reads and a sometimes a concurrent swap-in.

Two relevant outputs from vmstat are appended to the bottom of my email.
One has concurrent swap activity and one doesn't. - Very similar outputs
were produced on distinct occasions.

What I am wondering about is whether this is a NTFS bug at all. - It could
be a bug somewhere else which gets triggered by the high load that NTFS
driver causes. - It is a rather slow and sluggish beast after all... Or it
might be a NTFS bug, of course. - I just can't think why the context
switches go up like that!

I'll be looking into this, time permitting, but if anyone has any ideas
about how to figure out what is happening, I sure would like to hear them...

FYI: this is using the NTFS module compiled read only, no debugging, on a
2.4.0-test1-ac10 kernel with the NTFS patch which is in -ac11. - Compiling
with read write gives the same results though. - Haven't done a debug run
yet but will do, probably next week. - Very busy this weekend with my Ph.D.
work.

Here is the "vmstat 1" output with swap involvement - On a 48MB RAM
machine, plenty of free swap (total 128Mb swap), UP kernel:

    procs memory swap io system
  cpu
  r b w swpd free buff cache si so bi bo in cs us
sy id
  1 0 0 3676 964 25620 3356 0 0 15 0 146 128 23
65 12
  0 1 0 3676 916 25480 3356 0 0 13 1 133 126 35
63 2
  3 0 0 3676 740 25384 3356 0 0 24 0 145 132 24
55 22
  0 1 0 3676 1212 25052 3356 0 0 13 0 132 129 38
55 7
-> out of memory message appears just here or just after this. and the eth0
messages start around here too.
  1 0 1 3676 1148 24968 3384 20 0 18 0 138
110239 23 75 2
  1 0 0 3676 1148 24968 3384 0 0 0 0 113 465612 0
100 0
  1 0 1 3676 1148 24968 3384 0 0 0 7 123
462163 1 99 0
  1 0 1 3676 1148 24968 3384 0 0 0 0 123 467360 0
100 0
  0 0 0 3608 1108 24940 3488 108 0 36 0 150
392320 0 88 12
  0 0 0 3608 1108 24940 3488 0 0 0 0 132 5 1
43 56
  0 0 0 3608 1108 24940 3488 0 0 0 0 126 6 0
97 3
  0 0 0 3608 1108 24940 3488 0 0 0 6 134 9 0
95 5
  0 0 0 3608 1108 24940 3488 0 0 0 0 120 8 0
96 4
  0 0 0 3608 1108 24940 3488 0 0 0 0 114 5 0
98 2

Here is the "vmstat 1" output without swap involvement, same machine, same
partition, different place that it happened though:

    procs memory swap io system
  cpu
  r b w swpd free buff cache si so bi bo in cs us
sy id
  0 0 0 3920 1268 15856 3968 0 0 0 0 121 6 0
96 4
  0 0 0 3920 1268 15856 3968 0 0 0 0 107 9 1
94 5
  0 0 0 3920 1268 15856 3968 0 0 0 0 104 11 0
95 5
  1 0 0 3920 1064 15892 3968 0 0 136 0 258 295 19
58 24
  1 0 0 3920 1040 15880 3968 0 0 354 0 465 725 12
33 55
  1 0 0 3920 976 15920 3968 0 0 335 0 448 703 9
30 61
  1 0 0 3920 1056 15872 3968 0 0 421 0 544 843 13
26 61
-> out of memory message appears just here or just after this. and the eth0
messages start around here too.
  1 0 0 3920 1124 15768 3968 0 0 299 0 413
21450 9 31 60
  1 0 0 3920 1124 15768 3968 0 0 0 0 103
405888 1 99 0
  1 0 1 3920 1124 15768 3968 0 0 0 0 106 400260 0
100 0
  1 0 1 3920 1124 15768 3968 0 0 0 0 110 400872 0
100 0
  1 0 1 3920 1124 15768 3968 0 0 0 0 105 398976 0
100 0
  1 0 1 3920 1120 15768 3968 0 0 0 0 117
401800 1 99 0
  0 0 0 3920 1272 15736 3968 0 0 0 0 110
267229 0 66 34
  0 0 0 3920 1272 15736 3968 0 0 0 0 117 6 0
90 10
  0 0 0 3920 1272 15736 3968 0 0 0 0 110 6 0
96 4
  0 0 0 3920 1272 15736 3968 0 0 0 0 105 6 0
96 4
  0 0 0 3920 1272 15736 3968 0 0 0 20 123 11 1
93 6
  0 0 0 3920 1272 15736 3968 0 0 0 0 103 6 1
96 3
  0 0 0 3920 1272 15736 3968 0 0 0 0 113 6 0
97 3

Regards,

        Anton

--

"Education is what remains after one has forgotten everything he learned in school." - Albert Einstein

-- Anton Altaparmakov Voice: 01223-333541(lab) / 07712-632205(mobile) Christ's College eMail: AntonA@bigfoot.com Cambridge CB2 3BU ICQ: 8561279 United Kingdom WWW: http://www-stu.christs.cam.ac.uk/~aia21/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Jun 15 2000 - 21:00:19 EST