Re: shm bugs revisited (2)

From: Stephen C. Tweedie (sct@redhat.com)
Date: Fri Jan 28 2000 - 13:06:12 EST


Hi,

On Fri, 28 Jan 2000 19:33:38 +0100 (CET), Ingo Molnar
<mingo@chiara.csoma.elte.hu> said:

> yep they should, unfortunately. Ok, the TLB thing can be ruled out pretty
> safely.

> this really is a mystery. Does it happen if you use 'init=/bin/bash' or
> 'init=/bin/ash' as an init process?

Tried that a while ago. It fails. The bash case falls over pretty
quickly. Using ash gets a little further, but not much. Using sash
gets me to a command line prompt but actually doing anything with it
results in the child dying with sig-11.

> One interesting thing to see would be if you could print out a trace of
> pagefaults and compare it with a highmem-pagecache-disabled kernel (but
> otherwise compiled identically).

Will do. I've already traced the faulty kernel quite extensively, and
the fault address is just outside of the mapped area. I'll do the same
with a working kernel.

> Also, can you somehow rule out the effect of IO, or could you make it
> deterministic?

The traces I obtained last time I attacked this were all completely
deterministic (except for occassional variations due to the pgd bug ---
I don't see that failure mode any more). The kernel dies while still
paging in init, so we're not yet at the point where there is concurrent
IO going on when things fall apart. I've already eliminated things like
IO readahead as possible culprits: setting the paging cluster size to 1
page doesn't change things.

--Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Jan 31 2000 - 21:00:22 EST