Re: [PATCH] audit: add backlog high water mark metric

From: Paul Moore

Date: Fri Apr 10 2026 - 17:34:27 EST


On Mon, Mar 23, 2026 at 11:07 AM Ricardo Robaina <rrobaina@xxxxxxxxxx> wrote:
>
> Currently, determining the optimal `audit_backlog_limit` relies on
> instantaneous polling of the queue size. This misses transient
> micro-bursts, making it difficult for system administrators to know
> if their queue is adequately sized or if they are at risk of
> dropping events.
>
> This patch introduces `backlog_max_depth`, a high-water mark metric
> that tracks the maximum number of buffers in the audit queue since
> the system was booted or the metric was last reset. To minimize
> performance overhead in the fast-path, the metric is updated using
> a lockless cmpxchg loop in `__audit_log_end()`.
>
> Userspace can read-and-clear this metric by sending an `AUDIT_SET`
> message with the `AUDIT_STATUS_BACKLOG_MAX_DEPTH` mask. To support
> periodic telemetry polling (e.g., statsd, Prometheus), the reset
> operation atomically returns the snapshot of the high-water mark
> right before zeroing it, ensuring no peaks are lost between polls.
>
> Link: https://github.com/linux-audit/audit-kernel/issues/63
> Suggested-by: Steve Grubb <sgrubb@xxxxxxxxxx>
> Signed-off-by: Ricardo Robaina <rrobaina@xxxxxxxxxx>
> ---
> include/linux/audit.h | 3 ++-
> include/uapi/linux/audit.h | 2 ++
> kernel/audit.c | 32 ++++++++++++++++++++++++++++++++
> 3 files changed, 36 insertions(+), 1 deletion(-)

I sat on this for a bit because I wanted to think on it for a while.
While I agree audit could benefit from better statistics around
queue/backlog status, I'm not sure a single "max" value alone is worth
a bit in the audit_status bitmask. My concern is that the max queue
length only provides a single snapshot of what the queue looked like,
it doesn't give any indication of the average queue length over a span
of time. Some audit users are willing to live with occasional drops,
and the max size doesn't help them arrive at a good balance.

As for the users who can't tolerate any audit record drops? They
shouldn't be running with a backlog limit anyway so the maximum queue
value will be of limit use.

--
paul-moore.com