Re: Fw: crash on x86_64 - mm related?

From: Hugh Dickins
Date: Wed Dec 07 2005 - 13:37:39 EST


On Tue, 6 Dec 2005, Ryan Richter wrote:
> On Tue, Dec 06, 2005 at 08:31:43PM +0000, Hugh Dickins wrote:
> > Thanks for the further report. And you had my st.c patch in along
> > with 2.6.14.3, but it still happened, very much like before (except the
> > latter errors, general protection fault onwards - but once we get into
> > using one page for two uses at the same time, anything can go wrong).
> >
> > I've been staring and thinking, but no inspiration yet.
>
> That's correct, thanks for looking. Let me know if there's anything I
> could do get more information.

Still no luck with it, I'm afraid: I've found no error in st.c,
beyond what the patch already fixed, to account for what you see.

I have noticed that struct st_buffer's unsigned char do_dio ought to be
unsigned short, in order to accommodate its maximum value of 256; but
that would not account for your symptoms, nor does taper approach that
maximum.

The symptoms are consistent with that do_dio field (near the start of
struct st_buffer) being corrupted, infrequently but repeatedly - the
same page has been mis-released three times before your first error
message. But I see nothing wrong with st.c's handling of do_dio,
nor with get_user_pages' returned count.

So I think the best I can suggest is that you rebuild your kernel with
CONFIG_DEBUG_SLAB=y, and see if that sheds any light. And if you don't
mind doing so, advancing to 2.6.15-rc5 (which includes my patch) with
CONFIG_DEBUG_SLAB=y would be even better: there's a chance it's a bug
that's already been fixed. Though that won't really be proved if you
don't hit the problem, since the new kernel will be sufficiently
different that it might just have shifted the bug aside.

To be honest, the symptoms seem a little too regular to blame someone
overrunning their slab; but I've searched in vain and must move on.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/