Re: [stable] 2.6.16.6 breaks java... sort of

From: David Wilk
Date: Tue Apr 25 2006 - 14:08:51 EST


Ok, I think I need to apologize to everyone here. I have found the
problem, and it is not with your patch, Hugh. For some reason, the
config for my 2.6.16.7 source tree had a 1G/3G user/kernel split
configured. This is very bizaar as I copy my .config from tree to
tree to avoid any changes in the configuration of my test kernels.

I imagine this is the expected behavior when you only have 1G
configured for user space? right. I will be sure to include my
/proc/config.gz in the future to prevent this from happening again.

I've tested 2.6.16.7 and 2.6.16.9 and neither of them have the mmap constraint.

again, my apologies. I thank you for your patience and your hardwork
on the kernel.

thanks,
Dave

On 4/24/06, David Wilk <davidwilk@xxxxxxxxx> wrote:
> On 4/23/06, Hugh Dickins <hugh@xxxxxxxxxxx> wrote:
> > On Fri, 21 Apr 2006, David Wilk wrote:
> > > I finally got strace into the tomcat startup scripts properly and
> > > grabbed the attached output. I don't see any of the two lines you
> > > propose. I hope you guys can find this useful.
> >
> > Thanks for getting that, David. As you observe, it doesn't involve
> > shm at all, and the only mprotect is PROT_NONE. Do the abbreviated
> > messages in the final lines of the trace fit with the errors you
> > were originally reporting? (I think so.) Or is this particular
> > trace failing for some other reason, earlier than before, and we
> > need to try something else to identify the problem?
>
> I think this trace was taken while java was doing exactly what it was
> doing before. I actually restarted java many, many times with
> 2.6.16.9 and watched in amazement as it failed each and every time,
> with the exact same error message. This is tomcat, specifically, and
> it would die immediately after startup and it was started the exact
> same way with the init script. So, yeah, I think this trace is
> representative. I have no idea why it doesn't contain what you would
> consider relevant. I'm no programmer, unfortunately. However, I
> would like to help out any way that I can.
> >
> > > mmap2(NULL, 872415232, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
> > > write(1, "Error occurred during initializa"..., 43) = 43
> > > write(1, "Could not reserve enough space f"..., 46) = 46
> > > write(1, "\n", 1) = 1
> > > unlink("/tmp/hsperfdata_tomcat/12273") = 0
> > > write(2, "Could not create the Java virtua"..., 43) = 43
> >
> > To judge by this trace, I'd have to say that your problem has
> > nothing whatever to do with the shm/mprotect fix in 2.6.16.6,
> > and we've no evidence yet to complicate that fix. Interestingly,
> > nobody else has so far reported any problem with it.
>
> bizaar. I must say that this patch 'fixing' the problem just cannot
> be coincidental. Tomcat will never start without it, and never fails
> with it.
> >
> > Judging by the mmap addresses throughout the trace (top down, from
> > 0x37f2e000), it looks like you've got CONFIG_VMSPLIT_1G (not a good
> > choice for a box with only 1G of RAM: whereas CONFIG_VMSPLIT_3G_OPT
> > would maximize your userspace while avoiding the need for HIGHMEM);
> > and with the above 832M mmap, the remaining hole in user address
> > space is just too small to hold it.
>
> well, this is just a test box for a system we deploy on a dual-cpu
> server with 4GB of ram.
>
> can you describe the CONFIG_VMSPLIT_3G_OPT and how I might find it in
> 'make menuconfig'? is it the second option (3G/1G user/kernel)? I"ve
> got the first option selected which is 3G/1G user/kernel as well, but
> different somehow. As this is new to 2.6.16, I'm not familiar with
> the options. Perhaps my 1GB workstations would benefit from this
> second 3G/1G option?
> >
> > But that leaves me quite unable to explain why you should have
> > thought the shm/mprotect patch responsible, and why you should
> > find the more complicated version works. Stack randomization
> > changes the numbers a little, but I think not enough to explain
> > how it sometimes could fit 832M in there, sometimes not.
>
> Unfortunately I cannot speculate as to the cause, but experimentally
> (anecdotally anyway) the patch is 100% effective.
> >
> > Tell me I'm talking nonsense and we'll have another go:
> > I guess adding some printks on top of the "replacement"
> > patch, so it can tell us when it's having an effect.
>
> I'd never accuse you of nonsense, but I cannot refute the evidence. ; )
>
> if there is anything else you would like me todo to try to squeeze
> more data from this thing, please let me know.
> >
> > Hugh
> >
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/