Re: Filesize limitation

Andre Uratsuka Manoel (andre@insite.com.br)
Mon, 3 Nov 1997 21:40:30 -0200 (EDT)


On Mon, 3 Nov 1997, Richard B. Johnson wrote:

-> On Mon, 3 Nov 1997, Andre Uratsuka Manoel wrote:
-> [SNIPPED]
->
-> > I didn't say it correctly the first time. The file I was
-> > reported as having problems being created was slightly larger than 2GB.
-> > That file is generated every month and every month it gets bigger and
-> > bigger. In about 6 months it will probably not fit into 4 GB either.
-> >
-> [SNIPPED]
-> There appears to be something fundamentally wrong with a program
-> that uses such a data file.
->
-> In the late '60s IBM created a sort-merge procedure, first used to
-> sort the Chicago telephone directory. It became known as the "Chicago
-> Sort". It ran on an IBM-360 with 4 kilobytes of core-RAM which had
-> to contain both the program and the data. It works.
->

I partly agree with you, but not completely.

I agree with you that some programs that use such data files may
be plain wrong. I do think there are better ways in the case of that
particular program.

But there are other possibilities, too. Sometimes you just have
to have something. Sometimes you have to have lots of memory, disk
space, bandwidth, or CPU. When those times come, I prefere to be barred
by hardware instead of software constraints.

That is no excuse for badly written software, of course.
But the 2 GB limit looks a lot like the 640KB limit. When the 640KB
limit was created by IBM, it was hard to think of applications
that could use all that memory. But they came.

Let me state that again. I don't think that is the case for that
particular program I talked about previously, but even if it is not by
this case, there are others to come.

-> A few weeks ago, I attempted to use the M$Garbage Editor to edit a 130
-> kilobyte text file. It reported "out of RAM" and exited. The two stories
-> are related.
->
-> Until Software starts being written by Software Engineers, who are
-> trained in engineering disciplines, we will continue to have data expand
-> like gas to fill all available space. If the space isn't big enough,
-> the programs will crash.
->
-> Given the current tendency to throw RAM and Disk Drives at a problem,
-> it is unlikely that even 64 bits will be good enough in the near future.
-> This, in spite of the fact that 64 bits exceeds the dynamic range of
-> the universe (233 dB +/- 20 dB).

->
-> Even my Sparc won't help. An 'int' on the Sparc is 32 bits. Even if
-> you find a 64-bit architecture, that doesn't mean that its file-systems
-> will support the kind of file sizes that you propose.

Alphas on Digital Unix do.

-> The solution is to use files as files. They have names for very good
-> reasons. If a "master-file" is as long as you propose, it contains
-> too much information. Such a file should contain "keys" which allow
-> records existing in other file(s) to be sorted and merged without actually
-> having to copy any data. The records in the other files(s) should contain
-> the database information.

I didn't write the programs. Actually, I recommended that they
split the files. But they use an old-fashioned library that causes files
to grow to that size. They will probably have to change their programs
and that would be good if they would change them to something completely
different. As it is, though, they will just patch it to work in this
particular situation.

Regards
Andre