Re: [PATCH v2 5/5] audit: do not use exclusive wait in audit_receive()

From: Paul Moore
Date: Tue May 23 2023 - 17:07:43 EST


On Mon, May 22, 2023 at 11:58 PM Eiichi Tsukata
<eiichi.tsukata@xxxxxxxxxxx> wrote:
> > On May 22, 2023, at 13:44, Eiichi Tsukata <eiichi.tsukata@xxxxxxxxxxx> wrote:
> >> On May 20, 2023, at 5:54, Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
> >> On May 11, 2023 Eiichi Tsukata <eiichi.tsukata@xxxxxxxxxxx> wrote:
> >>>
> >>> kauditd thread issues wake_up() before it goes to sleep. The wake_up()
> >>> call wakes up only one process as waiter side uses exclusive wait.
> >>> This can be problematic when there are multiple processes (one is in
> >>> audit_receive() and others are in audit_log_start()) waiting on
> >>> audit_backlog_wait queue.
> >>>
> >>> For example, if there are two processes waiting:
> >>>
> >>> Process (A): in audit_receive()
> >>> Process (B): in audit_log_start()
> >>>
> >>> And (A) is at the head of the wait queue. Then kauditd's wake_up() only
> >>> wakes up (A) leaving (B) as it is even if @audit_queue is drained. As a
> >>> result, (B) can be blocked for up to backlog_wait_time.
> >>>
> >>> To prevent the issue, use non-exclusive wait in audit_receive() so that
> >>> kauditd can wake up all waiters in audit_receive().
> >>>
> >>> Fixes: 8f110f530635 ("audit: ensure userspace is penalized the same as the kernel when under pressure")
> >>> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@xxxxxxxxxxx>
> >>> ---
> >>> kernel/audit.c | 17 +++++++++++------
> >>> 1 file changed, 11 insertions(+), 6 deletions(-)
> >>
> >> This was also discussed in the last patchset.
> >
> > This bug is much easily reproducible on real environments and can cause problematic
> > user space failure like SSH connection timeout.
> > Let’s not keep the bug unfixed.
> > (Of course we’ve already carefully tuned audit related params and user space auditd config so that our product won’t hit backlog full.)

Good. Resolving your issues through audit runtime configuration is
the proper solution to this.

> > BTW, the default backlog_wait_time is 60 * HZ which seems pretty large.
> > I’d appreciate if you could tell me the reason behind that value.

I do not recall the original logic behind that value. It is likely
that the original value predated my maintenance of the audit
subsystem.

--
paul-moore.com