Re: [PATCH 1/2] IPVS: add wlib & wlip schedulers

From: Julian Anastasov
Date: Mon Jan 19 2015 - 18:18:14 EST



Hello,

On Sat, 17 Jan 2015, Chris Caputo wrote:

> From: Chris Caputo <ccaputo@xxxxxxx>
>
> IPVS wlib (Weighted Least Incoming Byterate) and wlip (Weighted Least Incoming
> Packetrate) schedulers, updated for 3.19-rc4.

The IPVS estimator uses 2-second timer to update
the stats, isn't that a problem for such schedulers?
Also, you schedule by incoming traffic rate which is
ok when clients mostly upload. But in the common case
clients mostly download and IPVS processes download
traffic only for NAT method.

May be not so useful idea: use sum of both directions
or control it with svc->flags & IP_VS_SVC_F_SCHED_WLIB_xxx
flags, see how "sh" scheduler supports flags. I.e.
inbps + outbps.

Another problem: pps and bps are shifted values,
see how ip_vs_read_estimator() reads them. ip_vs_est.c
contains comments that this code handles couple of
gigabits. May be inbps and outbps in struct ip_vs_estimator
should be changed to u64 to support more gigabits, with
separate patch.

> Signed-off-by: Chris Caputo <ccaputo@xxxxxxx>
> ---
> +++ linux-3.19-rc4/net/netfilter/ipvs/ip_vs_wlib.c 2015-01-17 22:47:35.421861075 +0000

> +/* Weighted Least Incoming Byterate scheduling */
> +static struct ip_vs_dest *
> +ip_vs_wlib_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
> + struct ip_vs_iphdr *iph)
> +{
> + struct list_head *p, *q;
> + struct ip_vs_dest *dest, *least = NULL;
> + u32 dr, lr = -1;
> + int dwgt, lwgt = 0;

To support u64 result from 32-bit multiply we can
change the vars as follows:

u32 dwgt, lwgt = 0;

> + spin_lock_bh(&svc->sched_lock);
> + p = (struct list_head *)svc->sched_data;
> + p = list_next_rcu(p);

Note that dests are deleted from svc->destinations
out of any lock (from __ip_vs_unlink_dest), above lock
svc->sched_lock protects only svc->sched_data.

So, RCU dereference is needed here, list_next_rcu is
not enough. Better to stick to the list walking from the
rr algorithm in ip_vs_rr.c.

> + q = p;
> + do {
> + /* skip list head */
> + if (q == &svc->destinations) {
> + q = list_next_rcu(q);
> + continue;
> + }
> +
> + dest = list_entry_rcu(q, struct ip_vs_dest, n_list);
> + dwgt = atomic_read(&dest->weight);

This will be dwgt = (u32) atomic_read(&dest->weight);

> + if (!(dest->flags & IP_VS_DEST_F_OVERLOAD) && dwgt > 0) {
> + spin_lock(&dest->stats.lock);
> + dr = dest->stats.ustats.inbps;
> + spin_unlock(&dest->stats.lock);
> +
> + if (!least ||
> + (u64)dr * (u64)lwgt < (u64)lr * (u64)dwgt ||

This will be (u64)dr * lwgt < (u64)lr * dwgt ||

See commit c16526a7b99c1c for 32x32 multiply.

> + (dr == lr && dwgt > lwgt)) {

Above check is redundant.

> + least = dest;
> + lr = dr;
> + lwgt = dwgt;
> + svc->sched_data = q;

Better to update sched_data at final, see below...

> + }
> + }
> + q = list_next_rcu(q);
> + } while (q != p);

if (least)
svc->sched_data = &least->n_list;

> + spin_unlock_bh(&svc->sched_lock);

Same comments for wlip.

Regards

--
Julian Anastasov <ja@xxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/