Re: [PATCH V2] audit: try harder to send to auditd upon netlink failure

From: Paul Moore
Date: Wed Sep 09 2015 - 16:42:00 EST


On Monday, September 07, 2015 05:10:13 AM Richard Guy Briggs wrote:
> There are several reports of the kernel losing contact with auditd when
> it is, in fact, still running. When this happens, kernel syslogs show:
> "audit: *NO* daemon at audit_pid=<pid>"
> although auditd is still running, and is apparently happy, listening on
> the netlink socket. The pid in the "*NO* daemon" message matches the pid
> of the running auditd process. Restarting auditd solves this.
>
> The problem appears to happen randomly, and doesn't seem to be strongly
> correlated to the rate of audit events being logged. The problem
> happens fairly regularly (every few days), but not yet reproduced to
> order.
>
> On production kernels, BUG_ON() is a no-op, so any error will trigger
> this.
>
> Commit 34eab0a7cd45 ("audit: prevent an older auditd shutdown from
> orphaning a newer auditd startup") eliminates one possible cause. This
> isn't the case here, since the PID in the error message and the PID of
> the running auditd match.
>
> The primary expected cause of error here is -ECONNREFUSED when the audit
> daemon goes away, when netlink_getsockbyportid() can't find the auditd
> portid entry in the netlink audit table (or there is no receive
> function). If -EPERM is returned, that situation isn't likely to be
> resolved in a timely fashion without administrator intervention. In
> both cases, reset the audit_pid. This does not rule out a race
> condition. SELinux is expected to return zero since this isn't an INET
> or INET6 socket. Other LSMs may have other return codes. Log the error
> code for better diagnosis in the future.
>
> In the case of -ENOMEM, the situation could be temporary, based on local
> or general availability of buffers. -EAGAIN should never happen since
> the netlink audit (kernel) socket is set to MAX_SCHEDULE_TIMEOUT.
> -ERESTARTSYS and -EINTR are not expected since this kernel thread is not
> expected to receive signals. In these cases (or any other unexpected
> ones for now), report the error and re-schedule the thread, retrying up
> to 5 times.
>
> v2:
> Removed BUG_ON().
> Moved comma in pr_*() statements.
> Removed audit_strerror() text.
>
> Reported-by: Vipin Rathor <v.rathor@xxxxxxxxx>
> Reported-by: <ctcard@xxxxxxxxxxx>
> Signed-off-by: Richard Guy Briggs <rgb@xxxxxxxxxx>
> ---
> kernel/audit.c | 24 +++++++++++++++++++-----
> 1 files changed, 19 insertions(+), 5 deletions(-)

Queued up for linux-audit#next as soon as 4.3-rc1 is released.

> diff --git a/kernel/audit.c b/kernel/audit.c
> index 1c13e42..18cdfe2 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -407,16 +407,30 @@ static void audit_printk_skb(struct sk_buff *skb)
> static void kauditd_send_skb(struct sk_buff *skb)
> {
> int err;
> + int attempts = 0;
> +#define AUDITD_RETRIES 5
> +
> +restart:
> /* take a reference in case we can't send it and we want to hold it */
> skb_get(skb);
> err = netlink_unicast(audit_sock, skb, audit_nlk_portid, 0);
> if (err < 0) {
> - BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
> + pr_err("netlink_unicast sending to audit_pid=%d returned error: %d\n",
> + audit_pid, err);
> if (audit_pid) {
> - pr_err("*NO* daemon at audit_pid=%d\n", audit_pid);
> - audit_log_lost("auditd disappeared");
> - audit_pid = 0;
> - audit_sock = NULL;
> + if (err == -ECONNREFUSED || err == -EPERM
> + || ++attempts >= AUDITD_RETRIES) {
> + audit_log_lost("audit_pid=%d reset");
> + audit_pid = 0;
> + audit_sock = NULL;
> + } else {
> + pr_warn("re-scheduling(#%d) write to audit_pid=%d\n",
> + attempts, audit_pid);
> + set_current_state(TASK_INTERRUPTIBLE);
> + schedule();
> + __set_current_state(TASK_RUNNING);
> + goto restart;
> + }
> }
> /* we might get lucky and get this in the next auditd */
> audit_hold_skb(skb);

--
paul moore
security @ redhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/