Re: [linus:master] [connector] c46bfba133: stress-ng.netlink-proc.ops_per_sec -97.2% regression

From: Yin, Fengwei
Date: Sun Jan 14 2024 - 21:22:40 EST




On 1/13/2024 4:56 PM, 王珂琦 Keqi Wang Wang wrote:
Hi Fengwei
Sorry, reply so late.
No problem.

I don't think this will cause a drastic drop in stress-ng netlink-proc performance. Because after returning -ESRCH, proc_event_num_listeners is cleared and the send_msg function will not be called.
However, there is a problem with judging clearing based on whether the return value is -ESRCH. Because netlink_broadcast will return -ESRCH in this case. Can you try the following patch to solve your problem?
Yes. This patch can make the regression of stress-ng.netlink-proc gone.


Regards
Yin, Fengwei


From 6e6c36aed156bbb185f54d0c0fef2f6683df3288 Mon Sep 17 00:00:00 2001
From: wangkeqi <wangkeqiwang@xxxxxxxxxxxxxx>
Date: Sat, 23 Dec 2023 13:21:17 +0800
Subject: [PATCH] connector: Fix proc_event_num_listeners count not cleared

When we register a cn_proc listening event, the proc_event_num_listener
variable will be incremented by one, but if PROC_CN_MCAST_IGNORE is
not called, the count will not decrease.
This will cause the proc_*_connector function to take the wrong path.
It will reappear when the forkstat tool exits via ctrl + c.
We solve this problem by determining whether
there are still listeners to clear proc_event_num_listener.

Signed-off-by: wangkeqi <wangkeqiwang@xxxxxxxxxxxxxx>
---
drivers/connector/cn_proc.c | 5 ++++-
drivers/connector/connector.c | 6 ++++++
include/linux/connector.h | 1 +
3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 44b19e696..b09f74ed3 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -108,8 +108,11 @@ static inline void send_msg(struct cn_msg *msg)
filter_data[1] = 0;
}

- cn_netlink_send_mult(msg, msg->len, 0, CN_IDX_PROC, GFP_NOWAIT,
+ if (netlink_has_listeners(get_cdev_nls(), CN_IDX_PROC))
+ cn_netlink_send_mult(msg, msg->len, 0, CN_IDX_PROC, GFP_NOWAIT,
cn_filter, (void *)filter_data);
+ else
+ atomic_set(&proc_event_num_listeners, 0);

local_unlock(&local_event.lock);
}
diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index 7f7b94f61..ced2655c6 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -120,6 +120,12 @@ int cn_netlink_send_mult(struct cn_msg *msg, u16 len, u32 portid, u32 __group,
}
EXPORT_SYMBOL_GPL(cn_netlink_send_mult);

+struct sock *get_cdev_nls(void)
+{
+ return cdev.nls;
+}
+EXPORT_SYMBOL_GPL(get_cdev_nls);
+
/* same as cn_netlink_send_mult except msg->len is used for len */
int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group,
gfp_t gfp_mask)
diff --git a/include/linux/connector.h b/include/linux/connector.h
index cec2d99ae..c601eb99b 100644
--- a/include/linux/connector.h
+++ b/include/linux/connector.h
@@ -126,6 +126,7 @@ int cn_netlink_send_mult(struct cn_msg *msg, u16 len, u32 portid,
* If there are no listeners for given group %-ESRCH can be returned.
*/
int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 group, gfp_t gfp_mask);
+struct sock *get_cdev_nls(void);

int cn_queue_add_callback(struct cn_queue_dev *dev, const char *name,
const struct cb_id *id,
--
2.27.0

在 2024/1/12 19:28,“Yin, Fengwei”<fengwei.yin@xxxxxxxxx <mailto:fengwei.yin@xxxxxxxxx>> 写入:






On 1/11/2024 11:19 PM, kernel test robot wrote:


Hello,

we reviewed this report and Fengwei (Cced) pointed out it could be the patch
breaks functionality, then causes stress-ng netlink-proc performance drops
dramatically.

Just FYI. Here is what I observed when running
stress-ng.netlink-proc testing:


Whatever with/without the patch, cn_netlink_send_mult() returns
-ESRCH in most case.


The following is what the cn_netlink_send_mult() returns when
stress-ng.netlink-proc is running:


...
213801 213801 stress-ng-3 cn_netlink_send_mult -3
213801 213801 stress-ng-spawn cn_netlink_send_mult -3
213801 213801 stress-ng-spawn cn_netlink_send_mult -3
213801 213801 stress-ng-wait cn_netlink_send_mult -3
213802 213802 stress-ng-4 cn_netlink_send_mult -3
213802 213802 stress-ng-spawn cn_netlink_send_mult -3
213802 213802 stress-ng-spawn cn_netlink_send_mult -3
213802 213802 stress-ng-wait cn_netlink_send_mult -3
213803 213803 stress-ng-5 cn_netlink_send_mult -3
213803 213803 stress-ng-dead cn_netlink_send_mult -3
213803 213803 stress-ng-dead cn_netlink_send_mult -3
213802 213802 stress-ng-wait cn_netlink_send_mult -3
213801 213801 stress-ng-wait cn_netlink_send_mult -3
213800 213800 stress-ng-wait cn_netlink_send_mult -3
213799 213799 stress-ng-wait cn_netlink_send_mult -3
213798 213798 stress-ng-wait cn_netlink_send_mult -3
154697 154697 stress-ng cn_netlink_send_mult -3
...




Looks like it's not accurate to reset proc_event_num_listeners
according to cn_netlink_send_mult() return value -3.




Regards
Yin, Fengwei