Re: [PATCH v2] netlink, audit: prevent false ENOBUFS on timeout expiry
From: Steve Grubb
Date: Tue Jun 09 2026 - 13:52:39 EST
Hello,
On Thursday, May 28, 2026 7:29:01 PM Eastern Daylight Time Jakub Kicinski
wrote:
> On Thu, 28 May 2026 18:40:44 -0400 Steve Grubb wrote:
> > > > (3) A new NETLINK_F_RECV_NO_ENOBUFS socket flag doesn't exist in
> > > > stable
> > > > kernels where this bug is actively impacting users
> > >
> > > Which commit are you referring to? Isn't that flag itself ancient?
> >
> > You're right, it is. I see how this flag would fix the pathological
> > behavior that was reported. But as I have looked at this suggestion,
> > there seems to be one wrinkle. User space should not need to know that
> > the audit code in the kernel has this retry mechanism.
>
> It's not about the retry mechanism, at least in my mind - I read
> your reply as "user space should not know that there was congestion".
> Why?
In the audit case, it is not useful. I know there can be an endless supply
and there's not much that can be done except dequeueing what's next.
> It's not very useful, I get that, but user space can just clear
> the congestion signal and keep going.
How? The recvfrom man page doesn't even discuss ENOBUFS. Which is one of the
strongest arguments for a kernel side patch. The fact that there is exists a
socket option to declare that you do not want ENOBUFS on netlink sockets is
esoteric knowledge. The netlink(7) man page does cover the flag. But even
where it discusses ENOBUFS, it does not mention that this is preventable by
setting a socket option. I do appreciate this being pointed out. But getting
from the recvfrom man page to a solution is not obvious.
> > It seems like the audit subsystem should set the flag on auditd's
> > socket at registration time in auditd_set(). The kernel is the right
> > place for this because it's the kernel that manages the retry/ hold
> > queues and sets the sk_sndtimeo that triggers the overrun path -
> > auditd has no knowledge of these internals.
>
> We have to carry this code somewhere, either in user space or in
> the kernel. I'd prefer not to carry it in the kernel.
I can put this in the audit daemon. But whoever else writes a similar app
will have to independently discover the same solution when faced with the
pathologically bad behavior. A kernel side fix would have made it easier for
future app developers to be successful.
-Steve