bdflush problem ?

=?iso-8859-1?Q?Fran=E7ois=20D=E9sarm=E9nien?= (desar@club-internet.fr)
Thu, 25 Nov 1999 23:12:19 +0100


I've experienced a very unpleasant loss of data at a customer site today,
which seems strange to me, with a 2.2.9 (Mandrake 6.0) non-smp kernel.

At 8:30 am, the system FTPed around 10 files from another server, then
loaded them into an Oracle database.

At 3:00 pm, the power went down.

Fsck ran smoothly, but the database did not recover due to corrupted rollback
segment tablespace (BTW, it seems databases are *always* corrupted after a crash
with 2.2 kernels, even with AIO disabled, which seems not be the case with 2.0
kernels AFAIK).

Having a cold backup from last night, I restored the database, and then prepared
to reload the files gotten from the morning by FTP, BUT...there were no more files.

Even the directory containing them had a time-stamp of *yesterday* morning.

I first suspected the files haven't been tranferred, but alas, I got some
proves they have been (from the other server).

Now, I wonder, as this server has a lot of memory (128MB) if all the changes
were kept in memory, bdflush not doing its job, it would lead to this exact
situation.

I've been using uni*es for years now, and was used to have bdflush activated
every 30 seconds to flush dirty buffers, so this couldn't happen.

It seems that with 2.2 kernels, the rules for flushing to disk have been improved
to a more efficient strategy.

My questions are:

- is my senario of dbflush not writing dirty buffers for 7 hours is plausible ?
- could the improvement of bdflush strategy make filesystem much more sensible to crashes ?
- are there known issues about it ?
- is there something I could do to avoid this kind of situation (without loosing too much speed) ?

Any clues (my bosses are wondering if Linux really was a good choice, so I've got to be fast to
prove them it was) ?

Thank you for your time,

François Désarménien

PS: please, if you reply with "upgrade to 2.2.1x", give me some clues of what the problem was: I
have to justify the problem will go away.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/