Re: netlink: GPF in sock_sndtimeo

From: Cong Wang
Date: Tue Dec 13 2016 - 19:31:53 EST

On Tue, Dec 13, 2016 at 2:52 AM, Richard Guy Briggs <rgb@xxxxxxxxxx> wrote:
> It is actually the audit_pid and audit_nlk_portid that I care about
> more. The audit daemon could vanish or close the socket while the
> kernel sock to which it was attached is still quite valid. Accessing
> the set of three atomically is the urge. I wonder if it makes more
> sense to test for the presence of auditd using audit_sock rather than
> audit_pid, but still keep audit_pid for our reporting and replacement
> strategy. Another idea would be to put the three in one struct.

Note, the process has audit_pid should hold a refcnt to the netns too,
so the netns can't be gone until that process is gone.

> Can someone explain how they think the original test was able to trigger
> this GPF? Network namespace shutdown while something pretended to set
> up a new auditd? That's impressive for a fuzzer if that's the case...
> Is there an strace? I guess it is all in test().

I am surprised you still don't get the race condition even when you
are now working on v2...

The race happens in this scenarios :

1) Create a new netns

2) In the new netns, communicate with kauditd to set audit_sock

3) Generate some audit messages, so kauditd will keep sending them
via audit_sock

4) exit the netns

5) the previous audit_sock is now going away, but kaudit_sock could still
access it in this small window.