Re: [PATCH 4.10 070/111] audit: fix auditd/kernel connection state tracking

From: Paul Moore
Date: Wed Feb 21 2018 - 18:04:40 EST


On Wed, Feb 21, 2018 at 3:46 AM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
> * Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
>
>> On Tue, Feb 20, 2018 at 10:18 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> > On Tue, Feb 20, 2018 at 09:51:08AM -0500, Paul Moore wrote:
>> >> On Tue, Feb 20, 2018 at 9:06 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> >
>> >> > It's not at all clear to me what that code does, I just stumbled upon
>> >> > __mutex_owner() outside of the mutex code itself and went WTF.
>> >>
>> >> If you don't want people to use __mutex_owner() outside of the mutex
>> >> code I might suggest adding a rather serious comment at the top of the
>> >> function, because right now I don't see anything suggesting that
>> >> function shouldn't be used. Yes, there is the double underscore
>> >> prefix, but that can mean a few different things these days.
>> >
>> > Find below.
>> >
>> >> > The comment (aside from having the most horribly style) ...
>> >>
>> >> Yeah, your dog is ugly too. Notice how neither comment is constructive?
>> >
>> > I'm sure you've seen this one:
>> >
>> > https://lkml.org/lkml/2016/7/8/625
>>
>> Yep. I stand behind my earlier comment in this thread.
>>
>> >> > Maybe if you could explain how that code is supposed to work and why it
>> >> > doesn't know if it holds a lock I could make a suggestion...
>> >>
>> >> I just spent a few minutes looking back over the bits available in
>> >> include/linux/mutex.h and I'm not seeing anything beyond
>> >> __mutex_owner() which would allow us to determine the mutex owning
>> >> task. It's probably easiest for us to just track ownership ourselves.
>> >> I'll put together a patch later today.
>> >
>> > Note that up until recently the mutex implementation didn't even have a
>> > consistent owner field. And the thing is, it's very easy to use wrong,
>> > only today I've seen a patch do: "__mutex_owner() == task", where task
>> > was allowed to be !current, which is just wrong.
>>
>> Arguably all the more reason why a strongly worded warning is
>> important (which I see you've included below, feel free to include my
>> Reviewed-by).
>>
>> > Looking through kernel/audit.c I'm not even sure I see how you would end
>> > up in audit_log_start() with audit_cmd_mutex held.
>> >
>> > Can you give me a few code paths that trigger this? Simple git-grep is
>> > failing me.
>>
>> Basically look at the code in audit_receive_msg(), but I wasn't asking
>> your opinion on how we should rewrite the audit subsystem, I was just
>> asking how one could determine if the current task was holding a given
>> mutex in a way that was acceptable to you. Based on your comments,
>> and some further inspection of the mutex code, it appears that is/was
>> not something that the core mutex code wants to support/make-visible.
>> Which is perfectly fine, I just wanted to make sure I wasn't missing
>> something before I went ahead and wrote a wrapper around the mutex
>> code for use by audit.
>>
>> FWIW, I just put together the following patch which removes the
>> __mutex_owner() call from audit and doesn't appear to break anything
>> on the audit side (you're CC'd on the patch). It has only been
>> lightly tested, but I'm going to bang on it for a day or so and if I
>> hear no objections I'll merge it into audit/next.
>>
>> * https://www.redhat.com/archives/linux-audit/2018-February/msg00066.html
>
> Could you please explain the audit_ctl_lock()/unlock() primitive you are
> introducing there? You seem to be implementing some sort of recursive locking
> primitive, but in a strange way.
>
> AFAICS the primary problem appears to be this code path:
>
> audit_receive() -> audit_receive_msg() -> AUDIT_TTY_SET -> audit_log_common_recv_msg() -> audit_log_start()
>
> where we can arrive already holding the lock.
>
> I.e. recursive mutex, kinda.
>
> What's the thinking there? Neither the changelog nor the code explains this.

I don't really go into great detail in the changelog, or comments in
the code, because I'm not really doing anything new with respect to
locking in this commit. The patch simply wraps the existing
mutex_{lock,unlock}() calls so that we can track the mutex owner. It
doesn't fundamentally change the locking, it's a quick patch to get
rid our our __mutex_owner() usage as Peter doesn't want anyone,
outside the mutex code, to use that function.

Based on your comments above, I'm guessing some of the
misunderstanding comes from the
__mutex_owner()/audit_ctl_owner_current() call in audit_log_start().
We try to determine the mutex/lock owner in audit_log_start() not
because we are trying to avoid a recursive lock, we do the check as an
optimization to skip the normal queue managment so that the lock
holder isn't subject to the same rescheduling/queue-management (is
"queue calming" a term?) as regular tasks.

--
paul moore
www.paul-moore.com