Re: [NFS] [PATCH 010 of 11] knfsd: make rpc threads pools numaaware

From: Andrew Morton
Date: Tue Aug 01 2006 - 00:41:47 EST


On Mon, 31 Jul 2006 15:54:57 +1000
Greg Banks <gnb@xxxxxxxxxxxxxxxxx> wrote:

> On Mon, 2006-07-31 at 14:42, Greg Banks wrote:
> > On Mon, 2006-07-31 at 14:36, Neil Brown wrote:
> > > On Sunday July 30, akpm@xxxxxxxx wrote:
> > > > On Mon, 31 Jul 2006 10:42:34 +1000
> > > > NeilBrown <neilb@xxxxxxx> wrote:
> > > >
> > > > > +static int
> > > > > +svc_pool_map_init_percpu(struct svc_pool_map *m)
> > > > > +{
> > > > > + unsigned int maxpools = num_possible_cpus();
> > > > > + unsigned int pidx = 0;
> > > > > + unsigned int cpu;
> > > > > + int err;
> > > > > +
> > > > > +
> > > >
> > > > That isn't right - it assumes that cpu_possible_map is not sparse. If it
> > > > is sparse, we allocate undersized pools and then overindex them.
> >
> > Umm, I think Andrew's right, num_possible_cpus() should be NR_CPUS.
>
> How about this version of the patch? It replaces num_possible_cpus()
> with highest_possible_processor_id()+1 and similarly for nodes.
> --
>
> knfsd: Actually implement multiple pools. On NUMA machines, allocate
> a svc_pool per NUMA node; on SMP a svc_pool per CPU; otherwise a single
> global pool. Enqueue sockets on the svc_pool corresponding to the CPU
> on which the socket bh is run (i.e. the NIC interrupt CPU). Threads
> have their cpu mask set to limit them to the CPUs in the svc_pool that
> owns them.
>
> This is the patch that allows an Altix to scale NFS traffic linearly
> beyond 4 CPUs and 4 NICs.
>
> Incorporates changes and feedback from Neil Brown, Trond Myklebust,
> Christoph Hellwig and Andrew Morton.
>

Something has gone rather wrong here.

> - serv = __svc_create(prog, bufsize, shutdown, /*npools*/1);
> + serv = __svc_create(prog, bufsize, shutdown, npools);

__svc_create() is:

__svc_create(struct svc_program *prog, unsigned int bufsize, int npools,
void (*shutdown)(struct svc_serv *serv))

so heaven knows what tree you're patching.

Incremental patches really are preferred. So we can see what people are
monkeying with ;)

After fixing the rejects and cleaning a few things up, your proposed change
amounts to:

--- a/net/sunrpc/svc.c~knfsd-make-rpc-threads-pools-numa-aware-fix
+++ a/net/sunrpc/svc.c
@@ -116,7 +116,7 @@ fail:
static int
svc_pool_map_init_percpu(struct svc_pool_map *m)
{
- unsigned int maxpools = num_possible_cpus();
+ unsigned int maxpools = highest_possible_processor_id() + 1;
unsigned int pidx = 0;
unsigned int cpu;
int err;
@@ -136,6 +136,18 @@ svc_pool_map_init_percpu(struct svc_pool
return pidx;
};

+static int
+highest_possible_node_id(void)
+{
+ unsigned int node;
+ unsigned int highest = 0;
+
+ for_each_node(node)
+ highest = node;
+
+ return highest;
+}
+

/*
* Initialise the pool map for SVC_POOL_PERNODE mode.
@@ -144,7 +156,7 @@ svc_pool_map_init_percpu(struct svc_pool
static int
svc_pool_map_init_pernode(struct svc_pool_map *m)
{
- unsigned int maxpools = num_possible_nodes();
+ unsigned int maxpools = highest_possible_node_id() + 1;
unsigned int pidx = 0;
unsigned int node;
int err;
_


Which shouldn't have compiled, due to the missing forward declaration. And
I'd be surprised if it worked very well with CONFIG_NUMA=n. And it's
naughty to be sneaking general library functions into the sunrpc code
anyway.

Please,

- Write a standalone patch which adds highest_possible_node_id() to
lib/cpumask.c(?)

Make sure it's inside #ifdef CONFIG_NUMA

Remember to export it to modules.

Provide a !CONFIG_NUMA version in include/linux/nodemask.h which just
returns constant zero.

Consider doing something more efficient than the for_each_node() loop.
Although I'm not sure what that would be, given that we don't have
find_last_bit().

- Provide an incremental patch against
knfsd-make-rpc-threads-pools-numa-aware.patch which utilises
highest_possible_node_id().

A replacement patch will be grudgingly accepted, but I'll only go and
turn it into an incremental one, so you can't hide ;)

- Test it real good. Modular, non-modular, NUMA, non-NUMA, !SMP.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/