RE: [PATCH] CIFS: Decrease reconnection delay when switching nics

From: Tom Talpey
Date: Thu Feb 28 2013 - 08:02:35 EST


> -----Original Message-----
> From: samba-technical-bounces@xxxxxxxxxxxxxxx [mailto:samba-technical-
> bounces@xxxxxxxxxxxxxxx] On Behalf Of Stefan (metze) Metzmacher
> Sent: Wednesday, February 27, 2013 7:16 PM
> To: Jeff Layton
> Cc: Steve French; Dave Chiluk; samba-technical@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; linux-cifs@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH] CIFS: Decrease reconnection delay when switching nics
>
> Am 27.02.2013 17:34, schrieb Jeff Layton:
> > On Wed, 27 Feb 2013 12:06:14 +0100
> > "Stefan (metze) Metzmacher" <metze@xxxxxxxxx> wrote:
> >
> >> Hi Dave,
> >>
> >>> When messages are currently in queue awaiting a response, decrease
> >>> amount of time before attempting cifs_reconnect to SMB_MAX_RTT =
> 10
> >>> seconds. The current wait time before attempting to reconnect is
> >>> currently 2*SMB_ECHO_INTERVAL(120
> >>> seconds) since the last response was recieved. This does not take
> >>> into account the fact that messages waiting for a response should be
> >>> serviced within a reasonable round trip time.
> >>
> >> Wouldn't that mean that the client will disconnect a good connection,
> >> if the server doesn't response within 10 seconds?
> >> Reads and Writes can take longer than 10 seconds...
> >>
> >
> > Where does this magic value of 10s come from? Note that a slow server
> > can take *minutes* to respond to writes that are long past the EOF.
> >
> >>> This fixes the issue where user moves from wired to wireless or vice
> >>> versa causing the mount to hang for 120 seconds, when it could
> >>> reconnect considerably faster. After this fix it will take
> >>> SMB_MAX_RTT (10 seconds) from the last time the user attempted to
> >>> access the volume or SMB_MAX_RTT after the last echo. The worst
> >>> case of the latter scenario being
> 2*SMB_ECHO_INTERVAL+SMB_MAX_RTT+small scheduling delay (about 130
> seconds).
> >>> Statistically speaking it would normally reconnect sooner. However
> >>> in the best case where the user changes nics, and immediately tries
> >>> to access the cifs share it will take SMB_MAX_RTT=10 seconds.
> >>
> >> I think it would be better to detect the broken connection by using
> >> an AF_NETLINK socket listening for RTM_DELADDR messages?
> >>
> >> metze
> >>
> >
> > Ick -- that sounds horrid ;)
>
> This is what winbindd uses to detect that a source ip of outgoing connections
> are gone. I don't know much of the kernel, there might be a better way from
> within the kernel to detect this. But this is exactly the correct thing to do to
> failover to another interface, as it just happens when the ip is removed
> without messing with a timeout value.
>
> Another optimization would be to use tcp keepalives (I think there 10
> seconds would be ok), I think that's what Windows SMB3 clients are using.

Yes, they do. See MS-SMB2 behavior note 144 attached to section 3.2.5.14.9.

10 seconds seems a fairly rapid keepalive interval. The TCP stack probably
won't allow it to be less than the maximum retransmit, for instance.

Tom.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/