Re: bdflush problem ?

=?iso-8859-1?Q?Fran=E7ois=20D=E9sarm=E9nien?= (desar@club-internet.fr)
Fri, 26 Nov 1999 12:01:24 +0100


Alan Cox wrote:
>
> > > Built with pgcc ?
> > How can I know ? Is there a sign in the executable itself ?
>
> "dmesg" will tell you
>

Ooops, of course !
But I'm even more puzzled, as it reports:

2.2.9-19 gcc version pgcc-2.91.66 199990314 (egcs-1.1.2 release)

so it seems to be pgcc, but it calls itsels egcs. What is it really ?

> > > if it crashed
> >
> > How could I know ? I don't have any process called 'bdflush', neither program file
> > with that name, just a man(8) entry.
>
> You would also find "oops" messages in the system log.

No, nothing in the logs neither about 'update' nor 'dbflush', but I noticed in the /etc/inittab
that /sbin/update has been commented out:

# Things to run in every runlevel.
## Linux Mandrake:
# Obsolete with 2.2.9 kernel. #
#ud::once:/sbin/update

is it normal ? What is new in 2.2.9 that makes update obsolete ?

>
> > What's wrong with it ? pgcc, egcs, gcc which are they ? Here, at home, I'm running a Mandrake 6.1
> > and 'gcc --version' reports 'pgcc-2.91.66'.
>
> pgcc is a fairly experimental compiler. We know in places it doesnt build the
> kernel correctly. Not always because of the compiler but sometimes because the
> kernel doesnt expect that kind of optimisation.
>

I see, but I'm still puzzled between gcc-pgcc-egcs

> > Maybe yes, maybe no. I submitted memory corruption to the french Oracle support, and they told me
> > they couldn't do anything, because Oracle world wild support *needs* reproducible problems, so they
> > didn't forward it. Shame on them.
> >
> > So, what would you suggest next ?
>
> I would like to know if the standard 2.2.12/13 or 14pre (either will do here)
> kernels built with gcc/egcs and not pgcc also show the problem. Something
> very odd is going on if your rollbacks fail. I'd like to know what even
> if Oracle dont.

Me too, I'd like to know ! I have now 4 servers on RH5.2 (no problems so far) and 4 with Mandrake 6.0
at customer sites, all of them with Oracle databases. I experienced a *lot* of data corruption with
the laters. As I told you, a system crash usually means rollback segments corruption. But I had weirder
things:

one day, a filesystem appeared corrupted, in a table segment. I was able to export it while skipping
the corrupted block. So then I rebuilt from scratch the database and started the import, and a few
seconds later, it told me that an index creation failed because of block corruption. I rebooted the
system, recreated the database an then did the import smoothly.

Oracle support (France) told me of disabling asynchronous I/O, as it appeared buggy (many reports of
corruption on Solaris, but also other Unices). So I did it on *all* the sites I administer. But they
didn't want to foward the problem to Oracle World Wide support, as they cannot make a test suite to
reproduce the problem.

With all of that, I suspected memory corruption from oracle itself, but found nothing related in newsgroups,
then memory corruption with Linux, no report about it neither, then inconsistant bdflush. I think the
later would explain lot of things. (I had too a customer reporting the loss of half a day of work after
a system crash, though Oracle logs showed a successful recovery).

Here I am.

I will follow your suggestion of building gcc (but which version ?) and recompile a 2.2.13 kernel with it,
but it's hard to know if the problem goes away, until next crash/corruption.

Oh, another strange thing: the files created at 8:30am "disapeared", but the /var/log/message file has been
correctly written every half hours (MARK) until the power fail.

Waiting to hear from you,

Many thanks,

François Désarménien

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/