Re: [PATCH] CIFS: Decrease reconnection delay when switching nics
From: Stefan (metze) Metzmacher
Date: Wed Feb 27 2013 - 19:16:11 EST
Am 27.02.2013 17:34, schrieb Jeff Layton:
> On Wed, 27 Feb 2013 12:06:14 +0100
> "Stefan (metze) Metzmacher" <metze@xxxxxxxxx> wrote:
>
>> Hi Dave,
>>
>>> When messages are currently in queue awaiting a response, decrease amount of
>>> time before attempting cifs_reconnect to SMB_MAX_RTT = 10 seconds. The current
>>> wait time before attempting to reconnect is currently 2*SMB_ECHO_INTERVAL(120
>>> seconds) since the last response was recieved. This does not take into account
>>> the fact that messages waiting for a response should be serviced within a
>>> reasonable round trip time.
>>
>> Wouldn't that mean that the client will disconnect a good connection,
>> if the server doesn't response within 10 seconds?
>> Reads and Writes can take longer than 10 seconds...
>>
>
> Where does this magic value of 10s come from? Note that a slow server
> can take *minutes* to respond to writes that are long past the EOF.
>
>>> This fixes the issue where user moves from wired to wireless or vice versa
>>> causing the mount to hang for 120 seconds, when it could reconnect considerably
>>> faster. After this fix it will take SMB_MAX_RTT (10 seconds) from the last
>>> time the user attempted to access the volume or SMB_MAX_RTT after the last
>>> echo. The worst case of the latter scenario being
>>> 2*SMB_ECHO_INTERVAL+SMB_MAX_RTT+small scheduling delay (about 130 seconds).
>>> Statistically speaking it would normally reconnect sooner. However in the best
>>> case where the user changes nics, and immediately tries to access the cifs
>>> share it will take SMB_MAX_RTT=10 seconds.
>>
>> I think it would be better to detect the broken connection
>> by using an AF_NETLINK socket listening for RTM_DELADDR
>> messages?
>>
>> metze
>>
>
> Ick -- that sounds horrid ;)
This is what winbindd uses to detect that a source ip of outgoing
connections
are gone. I don't know much of the kernel, there might be a better way
from within
the kernel to detect this. But this is exactly the correct thing to
do to failover to another interface, as it just happens when the ip is
removed
without messing with a timeout value.
Another optimization would be to use tcp keepalives (I think there 10
seconds would be ok),
I think that's what Windows SMB3 clients are using.
metze
Attachment:
signature.asc
Description: OpenPGP digital signature