RE: [PATCH] netlink: introduce netlink poll to resolve fast return issue

From: Jong eon Park
Date: Mon Nov 06 2023 - 21:05:23 EST




> -----Original Message-----
> From: Jakub Kicinski <kuba@xxxxxxxxxx>
> Sent: Tuesday, November 7, 2023 8:48 AM
> To: Jong eon Park <jongeon.park@xxxxxxxxxxx>; Paolo Abeni
> <pabeni@xxxxxxxxxx>
> Cc: David S. Miller <davem@xxxxxxxxxxxxx>; Eric Dumazet
> <edumazet@xxxxxxxxxx>; netdev@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; Dong ha Kang <dongha7.kang@xxxxxxxxxxx>
> Subject: Re: [PATCH] netlink: introduce netlink poll to resolve fast
> return issue
>
> On Fri, 3 Nov 2023 16:22:09 +0900 Jong eon Park wrote:
> > In very rare cases, there was an issue where a user's poll function
> > waiting for a uevent would continuously return very quickly, causing
> > excessive CPU usage due to the following scenario.
> >
> > Once sk_rcvbuf becomes full netlink_broadcast_deliver returns an error
> > and netlink_overrun is called. However, if netlink_overrun was called
> > in a context just before a another context returns from the poll and
> > recv is invoked, emptying the rcvbuf, sk->sk_err = ENOBUF is written
> > to the netlink socket belatedly and it enters the NETLINK_S_CONGESTED
> state.
> > If the user does not check for POLLERR, they cannot consume and clean
> > sk_err and repeatedly enter the situation where they call poll again
> > but return immediately.
> >
> > To address this issue, I would like to introduce the following netlink
> > poll.
> >
> > After calling the datagram_poll, netlink poll checks the
> > NETLINK_S_CONGESTED status and rcv queue, and this make the user to be
> > readable once more even if the user has already emptied rcv queue.
> > This allows the user to be able to consume sk->sk_err value through
> > netlink_recvmsg, thus the situation described above can be avoided
>
> The explanation makes sense, but I'm not able to make the jump in
> understanding how this is a netlink problem. datagram_poll() returns
> EPOLLERR because sk_err is set, what makes netlink special?
> The fact that we can have an sk_err with nothing in the recv queue?
>
> Paolo understands this better, maybe he can weigh in tomorrow...

Perhaps my explanation was not comprehensive enough.

The issue at hand is that once it occurs, users cannot escape from this
"busy running" situation, and the inadequate handling of EPOLLERR by users
imposes a heavy burden on the entire system, which seems quite harsh.

The reason for a separate netlink poll is related to the netlink state.
When it enters the NETLINK_S_CONGESTED state, sk can no longer receive or
deliver skb, and the receive_queue must be completely emptied to clear the
state. However, it was found that the NETLINK_S_CONGESTED state was still
maintained even when the receive_queue was empty, which was incorrect, and
that's why I implemented the handling in poll.

I don't consider this approach to be the best way, so if you have any
recommendations for a better solution, I would appreciate it.

Regards.
JE Park.