Re: netlink: GPF in sock_sndtimeo

From: Richard Guy Briggs
Date: Tue Dec 13 2016 - 23:17:10 EST

On 2016-12-13 16:17, Cong Wang wrote:
> On Tue, Dec 13, 2016 at 2:52 AM, Richard Guy Briggs <rgb@xxxxxxxxxx> wrote:
> > It is actually the audit_pid and audit_nlk_portid that I care about
> > more. The audit daemon could vanish or close the socket while the
> > kernel sock to which it was attached is still quite valid. Accessing
> > the set of three atomically is the urge. I wonder if it makes more
> > sense to test for the presence of auditd using audit_sock rather than
> > audit_pid, but still keep audit_pid for our reporting and replacement
> > strategy. Another idea would be to put the three in one struct.
> Note, the process has audit_pid should hold a refcnt to the netns too,
> so the netns can't be gone until that process is gone.

I noted that. I did wonder if there might be a problem if all the
processes were moved to another netns with the struct sock stuck in the
now process-void netns.

This is alluded-to in 6f285b19d09f ("audit: Send replies in the proper
network namespace.").

> > Can someone explain how they think the original test was able to trigger
> > this GPF? Network namespace shutdown while something pretended to set
> > up a new auditd? That's impressive for a fuzzer if that's the case...
> > Is there an strace? I guess it is all in test().
> I am surprised you still don't get the race condition even when you
> are now working on v2...
> The race happens in this scenarios :
> 1) Create a new netns
> 2) In the new netns, communicate with kauditd to set audit_sock
> 3) Generate some audit messages, so kauditd will keep sending them
> via audit_sock
> 4) exit the netns
> 5) the previous audit_sock is now going away, but kaudit_sock could still
> access it in this small window.

Ah ok that fits...


