[RFC PATCH] audit: Send netlink ACK before setting connection in auditd_set
From: Chris Riches
Date: Fri Sep 22 2023 - 11:53:53 EST
When auditd_set sets the auditd_conn pointer, audit messages can
immediately be put on the socket by other kernel threads. If the backlog
is large or the rate is high, this can immediately fill the socket
buffer. If the audit daemon requested an ACK for this operation, a full
socket buffer causes the ACK to get dropped, also setting ENOBUFS on the
socket.
To avoid this race and ensure ACKs get through, fast-track the ACK in
this specific case to ensure it is sent before auditd_conn is set.
Signed-off-by: Chris Riches <chris.riches@xxxxxxxxxxx>
---
This mail is more of an RFC than a patch, though the included patch is a
useful illustation and might even be suitable for inclusion.
We are experiencing strange failures where the audit daemon fails to
start, hitting an ENOBUFS error on its audit_set_pid() call. This can be
reproduced by repeatedly restarting the audit daemon (or just doing a
tight loop of audit_open(), audit_set_pid(pid), audit_set_pid(0),
audit_close()) while the system is under heavy audit load. This also
seems to be dependent on the number of CPUs - we can reproduce this with
2 CPUs but not with 48.
Tracing showed a race between the kernel setting auditd_conn and sending
the ACK, as described in the commit message. The included patch attempts
to fix this by ensuring the ACK is sent before any audit messages can be
put on the socket. This seems to fix the problem for auditd starting,
but strangely I still hit it when running my minimal repro (the tight
loop mentioned above), albeit less frequently. I'm not sure if the patch
or the repro is at fault.
We may also want to look at this from the userspace side in the audit
daemon itself (or more specifically, libaudit). The ACK handling is a
little odd - check_ack() will happily return success without seeing an
ACK if a non-ACK message is top of the socket queue, but will fail if no
message arrives within the timeout. It also of course fails if ENOBUFS
is set on the socket, but this failure only seems to matter when doing
audit_set_pid() - similar failures during main-loop message processing
are logged but otherwise ignored, as far as I can tell.
I'm not sure I quite understand the intentions of the code, but it seems
odd to let ENOBUFS be a fatal error here, given that it likely means the
socket buffer got flooded with audit messages, and thus audit_set_pid()
succeeded. Perhaps we should just ignore ENOBUFS (instead of or as well
as patching the kernel).
Finally, there is another oddity in audit_set_pid() that is tangential
to this discussion but worth highlighting: if the wmode parameter is
WAIT_YES (which it is for the audit daemon), then there is some
additional ACK-handling which waits for 100 milliseconds and eats the
top message of the socket queue if one arrives, without inspecting it.
This seems completely wrong as the ACK will have already been consumed
by check_ack() if there was one, and so the best this code can do is
nothing, and at worst it will swallow a genuine audit message without
ever recording it. If you agree with my assessment of this code, I'm
happy to submit a separate patch to fix this.
Thanks,
Chris
kernel/audit.c | 30 ++++++++++++++++++++++++------
kernel/audit.h | 5 +++++
2 files changed, 29 insertions(+), 6 deletions(-)
diff --git a/kernel/audit.c b/kernel/audit.c
index 9bc0b0301198..c48c778b6f89 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -488,12 +488,16 @@ static void auditd_conn_free(struct rcu_head *rcu)
* @pid: auditd PID
* @portid: auditd netlink portid
* @net: auditd network namespace pointer
+ * @ack_status: if AUDIT_ACK_ALWAYS, send ACK before setting connection and
+ * set to AUDIT_ACK_DONE
+ * @skb: socket buffer for sending ACK
*
* Description:
* This function will obtain and drop network namespace references as
* necessary. Returns zero on success, negative values on failure.
*/
-static int auditd_set(struct pid *pid, u32 portid, struct net *net)
+static int auditd_set(struct pid *pid, u32 portid, struct net *net,
+ int *ack_status, struct sk_buff *skb)
{
unsigned long flags;
struct auditd_connection *ac_old, *ac_new;
@@ -508,6 +512,13 @@ static int auditd_set(struct pid *pid, u32 portid, struct net *net)
ac_new->portid = portid;
ac_new->net = get_net(net);
+ if (*ack_status == AUDIT_ACK_ALWAYS) {
+ // Send ACK before we set auditd_conn. Otherwise, the socket buffer may
+ // get filled with backlogged audit messages causing the ACK to be dropped.
+ netlink_ack(skb, nlmsg_hdr(skb), 0, NULL);
+ *ack_status = AUDIT_ACK_DONE;
+ }
+
spin_lock_irqsave(&auditd_conn_lock, flags);
ac_old = rcu_dereference_protected(auditd_conn,
lockdep_is_held(&auditd_conn_lock));
@@ -1201,7 +1212,7 @@ static int audit_replace(struct pid *pid)
return auditd_send_unicast_skb(skb);
}
-static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
+static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh, int *ack_status)
{
u32 seq;
void *data;
@@ -1294,7 +1305,9 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
/* register a new auditd connection */
err = auditd_set(req_pid,
NETLINK_CB(skb).portid,
- sock_net(NETLINK_CB(skb).sk));
+ sock_net(NETLINK_CB(skb).sk),
+ ack_status,
+ skb);
if (audit_enabled != AUDIT_OFF)
audit_log_config_change("audit_pid",
new_pid,
@@ -1548,15 +1561,20 @@ static void audit_receive(struct sk_buff *skb)
*/
int len;
int err;
+ int ack_status = AUDIT_ACK_ON_ERR;
nlh = nlmsg_hdr(skb);
len = skb->len;
audit_ctl_lock();
while (nlmsg_ok(nlh, len)) {
- err = audit_receive_msg(skb, nlh);
- /* if err or if this message says it wants a response */
- if (err || (nlh->nlmsg_flags & NLM_F_ACK))
+ if (nlh->nlmsg_flags & NLM_F_ACK)
+ ack_status = AUDIT_ACK_ALWAYS;
+
+ err = audit_receive_msg(skb, nlh, &ack_status);
+
+ /* send ACK if err or the message asked for one (and not already sent) */
+ if (ack_status == AUDIT_ACK_ALWAYS || (ack_status == AUDIT_ACK_ON_ERR && err))
netlink_ack(skb, nlh, err, NULL);
nlh = nlmsg_next(nlh, &len);
diff --git a/kernel/audit.h b/kernel/audit.h
index 94738bce40b2..bc692f60567a 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -344,4 +344,9 @@ extern int audit_filter(int msgtype, unsigned int listtype);
extern void audit_ctl_lock(void);
extern void audit_ctl_unlock(void);
+/* Control netlink ACK behaviour in audit_receive. */
+#define AUDIT_ACK_ON_ERR 1
+#define AUDIT_ACK_ALWAYS 2
+#define AUDIT_ACK_DONE 3
+
#endif
--
2.36.6