Re: [rfc/rft][patch] should use scheduler sync hint intcp_prequeue()?

From: Eric Dumazet
Date: Tue Mar 02 2010 - 06:28:04 EST

Le mardi 02 mars 2010 Ã 10:41 +0100, Mike Galbraith a Ãcrit :
> Greetings network land.
> The reason for this query is that wake_affine() fails if there is one
> and only one task on a runqueue to encourage tasks spreading out, which
> increases cpu utilization. However, for tasks which are communicating
> at high frequency, the cost of the resulting cache misses, should
> partners land in non-shared caches, is horrible to behold. My Q6600 has
> shared caches, which may or may not be hit IFF something perturbs the
> system, and bounces partner to the right core. That won't happen on a
> box with no shared caches of course, and even with shared caches
> available, the pain is highly visible in the TCP numbers below.
> The sync hint tells wake_affine() that the waker is likely going to
> sleep soonish, so it subtracts the waker from the load imbalance
> calculation, allowing the partner task to be awakened affine. In the
> shared cache available case, that is also an enabler that the task be
> placed in an idle shared cache, which can increase throughput quite a
> bit (see .31 vs .33 AF UNIX), or may cost a bit if there is little to no
> execution overlap (see pipe).
> Now, I _could_ change wake_affine() to globally succeed in the one task
> case, but am loath to do so because that very well may upset delicate
> load balancing apple cart. I think it's much safer to target the spot
> that I know hurts like hell. Thoughts?
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 34f5cc2..ba3fc64 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -939,7 +939,7 @@ static inline int tcp_prequeue(struct sock *sk, struct sk_buff *skb)
> tp->ucopy.memory = 0;
> } else if (skb_queue_len(&tp->ucopy.prequeue) == 1) {
> - wake_up_interruptible_poll(sk->sk_sleep,
> + wake_up_interruptible_sync_poll(sk->sk_sleep,
> if (!inet_csk_ack_scheduled(sk))
> inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK,

I suspect this discussion is more a lkml topic but anyway...

This wake_up_interruptible_sync_poll() change might be good for loopback
communications (and pleases tbench), but is it desirable for regular
multi flows NIC traffic ?

Ingo probably can answer to this question, since he changed
sock_def_readable() (and others) in commit 6f3d09291b498299
I suspect he missed tcp_prequeue() case, maybe not...

sched, net: socket wakeups are sync

'sync' wakeups are a hint towards the scheduler that (certain)
networking related wakeups likely create coupling between tasks.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at