Re: [PATCH net-next 3/3] net: dql: Optimize stall information population

From: Eric Dumazet
Date: Thu Apr 04 2024 - 12:36:26 EST


On Thu, Apr 4, 2024 at 5:00 PM Breno Leitao <leitao@xxxxxxxxxx> wrote:
>
> When Dynamic Queue Limit (DQL) is set, it always populate stall
> information through dql_queue_stall(). However, this information is
> only necessary if a stall threshold is set, stored in struct
> dql->stall_thrs.
>
> dql_queue_stall() is cheap, but not free, since it does have memory
> barriers and so forth.
>
> Do not call dql_queue_stall() if there is no stall threshold set, and
> save some CPU cycles.
>
> Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx>
> ---
> include/linux/dynamic_queue_limits.h | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
> index 9980df0b7247..869afb800ea1 100644
> --- a/include/linux/dynamic_queue_limits.h
> +++ b/include/linux/dynamic_queue_limits.h
> @@ -137,7 +137,9 @@ static inline void dql_queued(struct dql *dql, unsigned int count)
>
> dql->num_queued += count;
>
> - dql_queue_stall(dql);
> + /* Only populate stall information if the threshold is set */
> + if (READ_ONCE(dql->stall_thrs))
> + dql_queue_stall(dql);

Note that this field is not in the first cache line of 'struct dql',
we will have some false sharing.

Perhaps we could copy it in a hole of the first cache line (used by producers).

struct dql {
unsigned int num_queued; /* 0 0x4 */
unsigned int adj_limit; /* 0x4 0x4 */
unsigned int last_obj_cnt; /* 0x8 0x4 */

/* XXX 4 bytes hole, try to pack */

unsigned long history_head; /* 0x10 0x8 */
unsigned long history[4]; /* 0x18 0x20 */

/* XXX 8 bytes hole, try to pack */

/* --- cacheline 1 boundary (64 bytes) --- */
unsigned int limit __attribute__((__aligned__(64))); /*
0x40 0x4 */
unsigned int num_completed; /* 0x44 0x4 */
unsigned int prev_ovlimit; /* 0x48 0x4 */
unsigned int prev_num_queued; /* 0x4c 0x4 */
unsigned int prev_last_obj_cnt; /* 0x50 0x4 */
unsigned int lowest_slack; /* 0x54 0x4 */
unsigned long slack_start_time; /* 0x58 0x8 */
unsigned int max_limit; /* 0x60 0x4 */
unsigned int min_limit; /* 0x64 0x4 */
unsigned int slack_hold_time; /* 0x68 0x4 */
unsigned short stall_thrs; /* 0x6c 0x2 */
unsigned short stall_max; /* 0x6e 0x2 */
unsigned long last_reap; /* 0x70 0x8 */
unsigned long stall_cnt; /* 0x78 0x8 */

/* size: 128, cachelines: 2, members: 19 */
/* sum members: 116, holes: 2, sum holes: 12 */
/* forced alignments: 1, forced holes: 1, sum forced holes: 8 */
};