Re: [PATCH v2 4/5] audit: check if audit_queue is full after prepare_to_wait_exclusive()

From: Paul Moore
Date: Tue May 23 2023 - 17:02:49 EST


On Mon, May 22, 2023 at 12:28 AM Eiichi Tsukata
<eiichi.tsukata@xxxxxxxxxxx> wrote:
> > On May 20, 2023, at 5:54, Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
> > On May 11, 2023 Eiichi Tsukata <eiichi.tsukata@xxxxxxxxxxx> wrote:
> >>
> >> Commit 7ffb8e317bae ("audit: we don't need to
> >> __set_current_state(TASK_RUNNING)") accidentally moved queue full check
> >> before add_wait_queue_exclusive() which introduced the following race:
> >>
> >> CPU1 CPU2
> >> ======== ========
> >> (in audit_log_start()) (in kauditd_thread())
> >>
> >> @audit_queue is full
> >> wake_up(&audit_backlog_wait)
> >> wait_event_freezable()
> >> add_wait_queue_exclusive()
> >> ...
> >> schedule_timeout()
> >>
> >> Once this happens, both audit_log_start() and kauditd_thread() can cause
> >> deadlock for up to backlog_wait_time waiting for each other. To prevent
> >> the race, this patch adds @audit_queue full check after
> >> prepare_to_wait_exclusive() and call schedule_timeout() only if the
> >> queue is full.
> >>
> >> Fixes: 7ffb8e317bae ("audit: we don't need to __set_current_state(TASK_RUNNING)")
> >> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@xxxxxxxxxxx>
> >> ---
> >> kernel/audit.c | 12 ++++++++++--
> >> 1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > I discussed my concerns with this patch in the last patchset, and I
> > believe they still apply here.
> >
>
> Please refer to the implementation of ___wait_event().
> It checks the condition *after* prepare_to_wait_event().
>
> Another similar example in the kernel code is unix_wait_for_peer().
> It checks unix_recvq_full() after prepare_to_wait_exclusive().
>
> I’m assuming this is a logical bug needs to be fixed.

I disagree, see my previous comments. The fixes you've presented do
not eliminate the possibility of rescheduling which could result in
some of the issues you've described. The proper fix for systems which
are sensitive to long scheduling delays such as this is to adjust your
audit runtime configuration so that audit does not block userspace.
Suggestions include removing the backlog limit and/or shortening the
wait time.

--
paul-moore.com