Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections,v2.6.26-rc3+

From: Ilpo Järvinen
Date: Fri Jun 06 2008 - 06:03:24 EST


On Thu, 5 Jun 2008, Patrick McManus wrote:

> On Fri, 2008-06-06 at 00:13 +0300, Ilpo Järvinen wrote:
>
> >
> > I'm out of new ideas what could be still wrong (I got confused and
> > lost
> > track number of times while I tried to verify socket locking today and
> > probably don't have more time for that now)... Unless somebody else
> > (Patrick?) comes up with something quickly,
>
> Sorry, I don't see anything - it seems to boil down to the same code in
> the DA and non-DA case as far as I can tell, but after a while all the
> twisty passages seem to look alike.

:-)

This Ingo's testcase should anyway be quite "simple", I mean that distcc
shouldn't do anything unexpected in a sense it shouldn't abort the flows
by not sending data, close the listening socket or other things like that.

> If Ingo confirms that the recv end was running the locking patch code,
> it would be interesting to just confirm the sysreq+t looks the same as
> before - it is possible the patch turned the race into a non-obvious
> deadlock.

...Yes, but we want that from the receiver's host rather than from the
sender end. Also checking that sender is still doing window probes once
per 2min is probably worth of it though a change in that is quite
unlikely.

> I'm sure your smaller revert will make the problem go away just as the
> larger one did, fwiw.

I'd very much except it to.

> The other odd thing is that Ingo did a lot of experimentation and was
> only making this happen on localhost before (though I agree there is
> nothing inherent about that lock and localhost) - isn't it odd that the
> first trigger of it now is between two hosts? What do you make of that?

...No, it occurred couple of times in the past between host as well,
nothing new in that.

--
i.