Re: Fatal Oops on boot with 2.4.0testX and recent GCC snapshots

From: Andreas Franck (afranck@gmx.de)
Date: Wed Dec 27 2000 - 19:26:19 EST


Hello Mike, hello Linus,

Some minutes ago, I wrote:
> I think I have found the reason for our bugs. It seems GCC really
> miscompiles buffer.c:bdflush_init without frame pointers. I'll try harder
> now to understand what excactly is going on, but it seems it is smashing
> its local stack space by decrementing its stack pointer too early, then
> calling an assembler function (__down_failed). It might be that GCC is
> confused by this.

[...]

> Any comments on this? I'll now try to split up the stack space operation in
> two parts, the first after call kernel_thread: addl $12, %esp (as in the
> first call), and an additional addl $64, %esp just before leaving (before
> popl %ebx). And I'll report what happened, later - but I have a good
> feeling that I have caught the bug.

... and my good feeling was right. Changing the bogus assembly code made the
bug go away. I'll try to prepare a simpler testcase for the GCC maintainers
tomorrow. For short, this is what happens: GCC tries to free its stack frame
for the local variables far too early. It then calls __down_failed(), which
pushes some things on the stack - thereby corrupting the semaphore pointer!
So __down() works on a random memory location instead of the semaphore, which
is guaranteed to fail badly.

I've added linux-kernel as CC again, so everybody can now hear that this is
definitely a GCC bug, and not a kernel issue.

Greetings,
Andreas

-- 
->>>----------------------- Andreas Franck --------<<<-
---<<<---- Andreas.Franck@post.rwth-aachen.de --->>>---
->>>---- Keep smiling! ----------------------------<<<-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Dec 31 2000 - 21:00:10 EST