Re: [RFC PATCH 00/14] nfsd/sunrpc: add support for a workqueue-based nfsd

From: Jeff Layton
Date: Wed Dec 03 2014 - 14:02:12 EST

On Wed, 3 Dec 2014 11:04:05 -0500
Jeff Layton <jlayton@xxxxxxxxxxxxxxx> wrote:

> On Wed, 3 Dec 2014 10:56:49 -0500
> Tejun Heo <tj@xxxxxxxxxx> wrote:
> > Hello, Neil, Jeff.
> >
> > On Tue, Dec 02, 2014 at 08:29:46PM -0500, Jeff Layton wrote:
> > > That's a good point. I had originally thought that max_active on an
> > > unbound workqueue would be the number of concurrent jobs that could run
> > > across all the CPUs, but now that I look I'm not sure that's really
> > > the case.
> >
> > @max_active is a per-pool number. By default, unbound wqs use
> > per-node pools, so @max_active would be per-node. Currently,
> > @max_active is mostly meant as a protection against run-away
> > workqueues creating crazy number of workers, which has been enough for
> > the existing wq users. *Maybe* it makes sense to make it actually
> > mean maximum concurrency which would prolly involve aggregated per-cpu
> > distribution mechanism so that we don't end up inc'ing and dec'ing the
> > same counter from all CPUs on each work item execution.
> >
> > However, I do agree with Neil that making it user configurable is
> > almost always painful. It's usually a question without a good answer
> > and the same value may behave differently depending on a lot of
> > implementation details and a better approach, probably, is to use
> > @max_active as the last resort protection mechanism while providing
> > automatic throttling of in-flight work items which is meaningful for
> > the specific use cases.
> >
> > > I've heard random grumblings from various people in the past that
> > > workqueues have significant latency, but this is the first time I've
> > > really hit it in practice. If we can get this fixed, then that may be a
> > > significant perf win for all workqueue users. For instance, rpciod in
> > > the NFS client is all workqueue-based. Getting that latency down could
> > > really help things.
> > >
> > > I'm currently trying to roll up a kernel module for benchmarking the
> > > workqueue dispatching code in the hopes that we can use that to help
> > > nail it down.
> >
> > Definitely, there were some reportings but nothing really got tracked
> > down properly. It'd be awesome to actually find out where the latency
> > is coming from.
> >
> > Thanks!
> >
> I think I might have figured this out (and before I go any farther
> allow me to say <facepalm>), thanks to the workqueue tracepoints in the
> code. What I noticed is that when things are fairly idle, the work is
> picked up quickly, but once things get busy it takes a lot longer.
> I think that the issue is in the design of the workqueue-based nfsd
> code. In particular, I attached a work_struct to the svc_xprt which is
> limiting the code to only process one RPC at a time for a xprt, from
> beginning to end.
> So, even if we requeue that work after the receive phase is done, the
> workqueue won't pick it up again until the thing is processed and the
> reply is sent.
> What I think I need to do is to do the receive phase using the
> work_struct attached to the xprt, and then do the rest of the
> processing from the context of a different work_struct (possibly one
> attached to the svc_rqst), which should free up the xprt's work_struct
> sooner.
> I'm going to work on changing that today and see if it improves things.
> Thanks for the help so far!

Yes! That does help. The new workqueue based code is a little (a few
percent?) slower than the thread-based code across the board. I suspect
that's due to the fact that I'm having to queue each RPC to the
workqueue twice (once for the receive and once to do the processing).

I suspect that I can remedy that, but I'll have to think about the best
way to do it.

Thanks again for the help!
Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at