Re: [PATCH v21 18/24] ovpn: add support for peer floating
From: Sabrina Dubroca
Date: Mon Mar 10 2025 - 18:32:49 EST
Note: as I tried to say at the end of my previous reply, feel free to
ignore this or save it for later. dst_cache_reset on float and change
of remote via netlink peer_modify is the important thing, and I'm not
sure the "local address change on float" can or can't happen.
2025-03-10, 13:57:09 +0100, Antonio Quartulli wrote:
> On 07/03/2025 11:12, Sabrina Dubroca wrote:
> > 2025-03-06, 11:02:50 +0100, Antonio Quartulli wrote:
> > > On 05/03/2025 17:56, Sabrina Dubroca wrote:
> > > > 2025-03-05, 14:14:36 +0100, Antonio Quartulli wrote:
> > > > > On 05/03/2025 12:20, Sabrina Dubroca wrote:
> > > > > > 2025-03-05, 00:19:32 +0100, Antonio Quartulli wrote:
> > > > > > > On 04/03/2025 19:37, Sabrina Dubroca wrote:
> > > > > > > > 2025-03-04, 01:33:48 +0100, Antonio Quartulli wrote:
> > > > > > > > > +void ovpn_peer_endpoints_update(struct ovpn_peer *peer, struct sk_buff *skb)
> > > > > > > > > +{
> > > > > > > > > + struct hlist_nulls_head *nhead;
> > > > > > > > > + struct sockaddr_storage ss;
> > > > > > > > > + const u8 *local_ip = NULL;
> > > > > > > > > + struct sockaddr_in6 *sa6;
> > > > > > > > > + struct sockaddr_in *sa;
> > > > > > > > > + struct ovpn_bind *bind;
> > > > > > > > > + size_t salen = 0;
> > > > > > > > > +
> > > > > > > > > + spin_lock_bh(&peer->lock);
> > > > > > > > > + bind = rcu_dereference_protected(peer->bind,
> > > > > > > > > + lockdep_is_held(&peer->lock));
> > > > > > > > > + if (unlikely(!bind))
> > > > > > > > > + goto unlock;
> > > > > > > > > +
> > > > > > > > > + switch (skb->protocol) {
> > > > > > > > > + case htons(ETH_P_IP):
> > > > > > > > > + /* float check */
> > > > > > > > > + if (unlikely(!ovpn_bind_skb_src_match(bind, skb))) {
> > > > > > > > > + if (bind->remote.in4.sin_family == AF_INET)
> > > > > > > > > + local_ip = (u8 *)&bind->local;
> > > > > > > >
> > > > > > > > If I'm reading this correctly, we always reuse the existing local
> > > > > > > > address when we have to re-create the bind, even if it doesn't match
> > > > > > > > the skb? The "local endpoint update" chunk below is doing that, but
> > > > > > > > only if we're keeping the same remote? It'll get updated the next time
> > > > > > > > we receive a packet and call ovpn_peer_endpoints_update.
> > > > > > > >
> > > > > > > > That might irritate the RPF check on the other side, if we still use
> > > > > > > > our "old" source to talk to the new dest?
> > > > > > > >
> > > > > > > > > + sa = (struct sockaddr_in *)&ss;
> > > > > > > > > + sa->sin_family = AF_INET;
> > > > > > > > > + sa->sin_addr.s_addr = ip_hdr(skb)->saddr;
> > > > > > > > > + sa->sin_port = udp_hdr(skb)->source;
> > > > > > > > > + salen = sizeof(*sa);
> > > > > > > > > + break;
> > > > > > >
> > > > > > > I think the issue is simply this 'break' above - by removing it, everything
> > > > > > > should work as expected.
> > > > > >
> > > > > > Only if the bind was of the correct family? Checking an IPv4 local
> > > > > > address (in the bind) against an IPv6 source address in the packet (or
> > > > > > the other way around) isn't going to work well.
> > > > >
> > > > > Ah I understand what you mean.
> > > > >
> > > > > The purpose of "local_ip" is to provide a working local endpoint to be used
> > > > > with the new remote address.
> > > > > However, if the float is switching family we can't re-use the same old local
> > > > > endpoint (hence the check).
> > > > > In this case we'll learn the "new" local address later.
> > > > >
> > > > > Does it make sense?
> > > >
> > > > Sure, but we could have learned it immediately from the packet we just
> > > > got, whether we're changing family or not. No need to wait for the
> > > > next RX packet to also learn the new local address.
> > >
> > > Indeed.
> > >
> > > >
> > > > But if we now do a dst_cache_reset with the peer float,
> > > > ovpn_udp*_output will have to do a new route/local address lookup and
> > > > I guess that should clean up the local address stored in the bind, and
> > > > then update the dst_cache with the local address we just found.
> > >
> > > Right and this may not truly be what we want.
> > >
> > > If peer X is sending packets to our IP1, we should at least try to reply
> > > from the same address.
> > >
> > > If we have two IPs, IP1 and IP2, and both can be used to reach peer X, we
> > > should always try to use the one where we received traffic from X in the
> > > first place.
> >
> > I had a thought that it might not be our prefered address to talk to
> > X, but it would probably be, since we decided to use it (and thus X
> > used it as remote to talk to us).
>
> I am not sure I follow this sentence: I think you are just confirming what I
> said above (please correct me if I am wrong)?
Yes. This may turn out to be incorrect (see the "some corner cases"
bit below), but let's wait for examples of that.
> > > OTOH hand it is also true that with floating detection on both sides, the
> > > situation will converge quickly, but there might be a reason why X chose IP1
> > > as destination, therefore we should do our best to respect that.
> >
> > And I guess the primary reason for X to choose IP1 would be "we sent
> > packets to X from IP1".
>
> Probably. It truly depends on who initiated the connection.
>
> >
> > > So, even in case of float, we should still store the local endpoint and
> > > attempt fetching a route that takes that into consideration.
> > > Which I think is what is happening (assuming we reset the dst_cache on
> > > float).
> >
> > Not at the same time as float, unless ovpn_peer_endpoints_update sets
> > local_ip = ip_hdr(skb)->daddr unconditionally on float?
> >
> > Otherwise the next route lookup in ovpn_udpX_output will pick whatever
> > source address it wants (which would likely match what's in the
> > received skb during float, so probably fine anyway).
> >
>
> But that's what the code just below in ovpn_peer_endpoints_update() does,
> no?
I think this is going a bit in circles :) And possibly completely
irrelevant.
The /* local endpoint update */ is (correctly) skipped in case of
float [because the family may not be right so we can't compare
addresses].
In case of float, I think setting local_ip to the packet's header
would do the right thing is all cases:
- if it matches what's in the old bind, great, we didn't change our
local IP, just the remote and we could have kept local_ip = &bind->local
- if it doesn't match, we learn it (but that brings the question: why
did the peer think our address was IP2 and we thought it was IP1? --
which may be an irrelevant corner case)
>
> 223 /* local endpoint update */
> 224 if (unlikely(bind->local.ipv4.s_addr !=
> ip_hdr(skb)->daddr)) {
>
> ...
>
> 229 bind->local.ipv4.s_addr = ip_hdr(skb)->daddr;
>
>
> > > ovpn_udpX_output() will:
> > > * get no rt from the cache
> > > * possibly confirm that saddr is ok
> > > * fetch the new rt using the provided saddr and daddr
> > > * update the cache.
> > >
> > > That makes sense to me.
> > > Would you agree?
> >
> > With dst_cache reset on float, yes. As long as we have that, the main
> > behavior seems correct to me. (maybe some corner cases will not be
> > handled optimally, but that can be improved later - which is most
> > likely what I've been discussing in these emails :))
>
> Yeah :)
> >
> > [this could be a useful counter to add in the future: number of floats
> > and local address updates - so the user can check if that's increasing
> > "too often", which would indicate something weird is happening]
>
> ACK, good idea!
>
> Thanks!
>
> Ok, I'll probably wait a little more and then prepare v22.
I won't be able to look at it until the week-end. But if you're ready,
you can go ahead and post it anyway.
--
Sabrina