Re: [PATCH v2] audit: use proper refcount locking on audit_sock

From: Richard Guy Briggs
Date: Tue Dec 13 2016 - 10:01:54 EST

Next message: Agustin Vega-Frias: "Re: [PATCH V8 1/3] ACPI: Add support for ResourceSource/IRQ domain mapping"
Previous message: Paul E. McKenney: "Re: [PATCH] rcu: shift by 1UL rather than 1 to fix sign extension error"
In reply to: Richard Guy Briggs: "Re: [PATCH v2] audit: use proper refcount locking on audit_sock"
Next in thread: Cong Wang: "Re: [PATCH v2] audit: use proper refcount locking on audit_sock"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2016-12-13 00:10, Richard Guy Briggs wrote:
> On 2016-12-12 15:18, Paul Moore wrote:
> > On Mon, Dec 12, 2016 at 5:03 AM, Richard Guy Briggs <rgb@xxxxxxxxxx> wrote:
> > > Resetting audit_sock appears to be racy.
> > >
> > > audit_sock was being copied and dereferenced without using a refcount on
> > > the source sock.
> > >
> > > Bump the refcount on the underlying sock when we store a refrence in
> > > audit_sock and release it when we reset audit_sock. audit_sock
> > > modification needs the audit_cmd_mutex.
> > >
> > > See: https://lkml.org/lkml/2016/11/26/232
> > >
> > > Thanks to Eric Dumazet <edumazet@xxxxxxxxxx> and Cong Wang
> > > <xiyou.wangcong@xxxxxxxxx> on ideas how to fix it.
> > >
> > > Signed-off-by: Richard Guy Briggs <rgb@xxxxxxxxxx>
> > > ---
> > > There has been a lot of change in the audit code that is about to go
> > > upstream to address audit queue issues. This patch is based on the
> > > source tree: git://git.infradead.org/users/pcmoore/audit#next
> > > ---
> > > kernel/audit.c | 34 ++++++++++++++++++++++++++++------
> > > 1 files changed, 28 insertions(+), 6 deletions(-)
> > > diff --git a/kernel/audit.c b/kernel/audit.c
> > > index f20eee0..439f7f3 100644
> > > --- a/kernel/audit.c
> > > +++ b/kernel/audit.c
> > > @@ -452,7 +452,9 @@ static void auditd_reset(void)
> > > struct sk_buff *skb;
> > >
> > > /* break the connection */
> > > + sock_put(audit_sock);
> > > audit_pid = 0;
> > > + audit_nlk_portid = 0;
> > > audit_sock = NULL;
> > >
> > > /* flush all of the retry queue to the hold queue */
> > > @@ -478,6 +480,12 @@ static int kauditd_send_unicast_skb(struct sk_buff *skb)
> > > if (rc >= 0) {
> > > consume_skb(skb);
> > > rc = 0;
> > > + } else {
> > > + if (rc & (-ENOMEM|-EPERM|-ECONNREFUSED)) {
> >
> > I dislike the way you wrote this because instead of simply looking at
> > this to see if it correct I need to sort out all the bits and find out
> > if there are other error codes that could run afoul of this check ...
> > make it simple, e.g. (rc == -ENOMEM || rc == -EPERM || ...).
> > Actually, since EPERM is 1, -EPERM (-1 in two's compliment is
> > 0xffffffff) is going to cause this to be true for pretty much any
> > value of rc, yes?
>
> Yes, you are correct. We need there a logical or on the results of each
> comparison to the return code rather than bit-wise or-ing the result
> codes together first to save a step.
>
> > > + mutex_lock(&audit_cmd_mutex);
> > > + auditd_reset();
> > > + mutex_unlock(&audit_cmd_mutex);
> > > + }
> >
> > The code in audit#next handles netlink_unicast() errors in
> > kauditd_thread() and you are adding error handling code here in
> > kauditd_send_unicast_skb() ... that's messy. I don't care too much
> > where the auditd_reset() call is made, but let's only do it in one
> > function; FWIW, I originally put the error handling code in
> > kauditd_thread() because there was other error handling code that
> > needed to done in that scope so it resulted in cleaner code.
>
> Hmmm, I seem to remember it not returning the return code and I thought
> I had changed it to do so, but I see now that it was already there.
> Agreed, I needlessly duplicated that error handling.
>
> > Related, I see you are now considering ENOMEM to be a fatal condition,
> > that differs from the AUDITD_BAD macro in kauditd_thread(); this
> > difference needs to be reconciled.
>
> Also correct about -EPERM now that I check back to the intent of commit
> 32a1dbaece7e ("audit: try harder to send to auditd upon netlink
> failure")
>
> > Finally, you should update the comment header block for auditd_reset()
> > that it needs to be called with the audit_cmd_mutex held.
>
> Yup.
>
> > > @@ -1004,17 +1018,22 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> > > return -EACCES;
> > > }
> > > if (audit_pid && new_pid &&
> > > - audit_replace(requesting_pid) != -ECONNREFUSED) {
> > > + (audit_replace(requesting_pid) & (-ECONNREFUSED|-EPERM|-ENOMEM))) {
> >
> > Do we simply want to treat any error here as fatal, and not just
> > ECONN/EPERM/ENOMEM? If not, let's come up with a single macro to
> > handle the fatal netlink_unicast() return codes so we have some chance
> > to keep things consistent in the future.
>
> I'll work through this before I post another patch...

Ok, I've gone back to look at the reasoning in commit 133e1e5acd4a
("audit: stop an old auditd being starved out by a new auditd") which
suggests only ECONNREFUSED can cause an audit_pid replace, so I've
returned it to its original state.

I'll post another tested patch, but I'm still not that happy that it
does not proactively reset audit_pid, audit_nlk_portid and audit_sock
when auditd's socket has a problem. I'll leave the test run overnight.

> > paul moore
>
> - RGB

- RGB

--
Richard Guy Briggs <rgb@xxxxxxxxxx>
Kernel Security Engineering, Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635

Next message: Agustin Vega-Frias: "Re: [PATCH V8 1/3] ACPI: Add support for ResourceSource/IRQ domain mapping"
Previous message: Paul E. McKenney: "Re: [PATCH] rcu: shift by 1UL rather than 1 to fix sign extension error"
In reply to: Richard Guy Briggs: "Re: [PATCH v2] audit: use proper refcount locking on audit_sock"
Next in thread: Cong Wang: "Re: [PATCH v2] audit: use proper refcount locking on audit_sock"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]