Re: [PATCH net-next v11 15/23] ovpn: implement keepalive mechanism

From: Antonio Quartulli
Date: Tue Nov 12 2024 - 08:20:35 EST


On 05/11/2024 19:10, Sabrina Dubroca wrote:
2024-10-29, 11:47:28 +0100, Antonio Quartulli wrote:
@@ -105,6 +132,9 @@ void ovpn_decrypt_post(void *data, int ret)
goto drop;
}
+ /* keep track of last received authenticated packet for keepalive */
+ peer->last_recv = ktime_get_real_seconds();

It doesn't look like we're locking the peer here so that should be a
WRITE_ONCE() (and READ_ONCE(peer->last_recv) for all reads).

Is that because last_recv is 64 bit long (and might be more than one word on certain architectures)?

I don't remember having to do so for reading/writing 32 bit long integers.

I presume we need a WRITE_ONCE also upon initialization in ovpn_peer_keepalive_set() right?
We still want to coordinate that with other reads/writes.


+
/* point to encapsulated IP packet */
__skb_pull(skb, payload_offset);
@@ -121,6 +151,12 @@ void ovpn_decrypt_post(void *data, int ret)
goto drop;
}
+ if (ovpn_is_keepalive(skb)) {
+ net_dbg_ratelimited("%s: ping received from peer %u\n",
+ peer->ovpn->dev->name, peer->id);
+ goto drop;

To help with debugging connectivity issues, maybe keepalives shouldn't
be counted as drops? (consume_skb instead of kfree_skb, and not
incrementing rx_dropped)
The packet was successfully received and did all it had to do.

you're absolutely right. Will change that.


+ }
+
net_info_ratelimited("%s: unsupported protocol received from peer %u\n",
peer->ovpn->dev->name, peer->id);
goto drop;
@@ -221,6 +257,10 @@ void ovpn_encrypt_post(void *data, int ret)
/* no transport configured yet */
goto err;
}
+
+ /* keep track of last sent packet for keepalive */
+ peer->last_sent = ktime_get_real_seconds();

And another WRITE_ONCE() here (also paired with READ_ONCE() on the
read side).

Yap



+static int ovpn_peer_del_nolock(struct ovpn_peer *peer,
+ enum ovpn_del_peer_reason reason)
+{
+ switch (peer->ovpn->mode) {
+ case OVPN_MODE_MP:

I think it would be nice to add

lockdep_assert_held(&peer->ovpn->peers->lock);

+ return ovpn_peer_del_mp(peer, reason);
+ case OVPN_MODE_P2P:

and here

lockdep_assert_held(&peer->ovpn->lock);

Yeah, good idea.
__must_hold() can't work here, so lockdep_assert_held is definitely the way to go.


(I had to check that ovpn_peer_del_nolock is indeed called with those
locks held since they're taken by ovpn_peer_keepalive_work_{mp,p2p},
adding these assertions would make it clear that ovpn_peer_del_nolock
is not an unsafe version of ovpn_peer_del)

Right, it makes sense.


+ return ovpn_peer_del_p2p(peer, reason);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
/**
* ovpn_peers_free - free all peers in the instance
* @ovpn: the instance whose peers should be released
@@ -830,3 +871,150 @@ void ovpn_peers_free(struct ovpn_struct *ovpn)
ovpn_peer_unhash(peer, OVPN_DEL_PEER_REASON_TEARDOWN);
spin_unlock_bh(&ovpn->peers->lock);
}
+
+static time64_t ovpn_peer_keepalive_work_single(struct ovpn_peer *peer,
+ time64_t now)
+{
+ time64_t next_run1, next_run2, delta;
+ unsigned long timeout, interval;
+ bool expired;
+
+ spin_lock_bh(&peer->lock);
+ /* we expect both timers to be configured at the same time,
+ * therefore bail out if either is not set
+ */
+ if (!peer->keepalive_timeout || !peer->keepalive_interval) {
+ spin_unlock_bh(&peer->lock);
+ return 0;
+ }
+
+ /* check for peer timeout */
+ expired = false;
+ timeout = peer->keepalive_timeout;
+ delta = now - peer->last_recv;

I'm not sure that's always > 0 if we finish decrypting a packet just
as the workqueue starts:

ovpn_peer_keepalive_work
now = ...

ovpn_decrypt_post
peer->last_recv = ...

ovpn_peer_keepalive_work_single
delta: now < peer->last_recv


Yeah, there is nothing preventing this from happening...but is this truly a problem? The math should still work, no?

However:



+ if (delta < timeout) {
+ peer->keepalive_recv_exp = now + timeout - delta;

I'd shorten that to

peer->keepalive_recv_exp = peer->last_recv + timeout;

it's a bit more readable to my eyes and avoids risks of wrapping
values.

So I'd probably get rid of delta and go with:

last_recv = READ_ONCE(peer->last_recv)
if (now < last_recv + timeout) {
peer->keepalive_recv_exp = last_recv + timeout;
next_run1 = peer->keepalive_recv_exp;
} else if ...

+ next_run1 = peer->keepalive_recv_exp;
+ } else if (peer->keepalive_recv_exp > now) {
+ next_run1 = peer->keepalive_recv_exp;
+ } else {
+ expired = true;
+ }

I agree this is simpler to read and gets rid of some extra operations.

[note: I took inspiration from nat_keepalive_work_single() - it could be simplified as well I guess]


[...]
+ /* check for peer keepalive */
+ expired = false;
+ interval = peer->keepalive_interval;
+ delta = now - peer->last_sent;
+ if (delta < interval) {
+ peer->keepalive_xmit_exp = now + interval - delta;
+ next_run2 = peer->keepalive_xmit_exp;

and same here

Yeah, will change both. Thanks!


Regards,


--
Antonio Quartulli
OpenVPN Inc.