Re: [RFC PATCH 00/14] nfsd/sunrpc: add support for a workqueue-based nfsd

From: Jeff Layton
Date: Tue Dec 02 2014 - 20:30:18 EST


On Wed, 3 Dec 2014 12:11:18 +1100
NeilBrown <neilb@xxxxxxx> wrote:

> On Tue, 2 Dec 2014 13:24:09 -0500 Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
> wrote:
>
> > tl;dr: this code works and is much simpler than the dedicated thread
> > pool, but there are some latencies in the workqueue code that
> > seem to keep it from being as fast as it could be.
> >
> > This patchset is a little skunkworks project that I've been poking at
> > for the last few weeks. Currently nfsd uses a dedicated thread pool to
> > handle RPCs, but that requires maintaining a rather large swath of
> > "fiddly" code to handle the threads and transports.
> >
> > This patchset represents an alternative approach, which makes nfsd use
> > workqueues to do its bidding rather than a dedicated thread pool. When a
> > transport needs to do work, we simply queue it to the workqueue in
> > softirq context and let it service the transport.
> >
> > The current draft is runtime-switchable via a new sunrpc pool_mode
> > module parameter setting. When that's set to "workqueue", nfsd will use
> > a workqueue-based service. One of the goals of this patchset was to
> > *not* need to change any userland code, so starting it up using rpc.nfsd
> > still works as expected. The only real difference is that the nfsdfs
> > "threads" file is reinterpreted as the "max_active" value for the
> > workqueue.
>
> Hi Jeff,
> I haven't looked very closely at the code, but in principal I think this is
> an excellent idea. Having to set a number of threads manually was never
> nice as it is impossible to give sensible guidance on what an appropriate
> number is.

Yes, this should be a lot more "dynamic" in principle. A lightly-used
nfs server using workqueues should really not consume much at all in
the way of resources.

> Tying max_active to "threads" doesn't really make sense I think.
> "max_active" is a per-cpu number and I think the only meaningful numbers are
> "1" (meaning concurrent works might mutually deadlock) or infinity (which is
> approximated as 512). I would just ignore the "threads" number when
> workqueues are used.... or maybe enable workqueues when "auto" is written to
> "threads"??
>

That's a good point. I had originally thought that max_active on an
unbound workqueue would be the number of concurrent jobs that could run
across all the CPUs, but now that I look I'm not sure that's really
the case.

Certainly, not bounding this at all has some appeal, but having some
limit here does help put an upper bound on the size of the svc_rqst
cache.

I'll definitely give that some thought.

> Using a shrinker to manage the allocation and freeing of svc_rqst is a
> really good idea. It will put pressure on the effective number of threads
> when needed, but will not artificially constrain things.
>

That's my hope. We might also consider reaping idle rqsts ourselves
as needed somehow, but for now I decided not to worry about it until we
see whether this is really viable or not.

> The combination of workqueue and shrinker seems like a perfect match for
> nfsd.
>
> I hope you can work out the latency issues!
>
> Thanks,
> NeilBrown

I've heard random grumblings from various people in the past that
workqueues have significant latency, but this is the first time I've
really hit it in practice. If we can get this fixed, then that may be a
significant perf win for all workqueue users. For instance, rpciod in
the NFS client is all workqueue-based. Getting that latency down could
really help things.

I'm currently trying to roll up a kernel module for benchmarking the
workqueue dispatching code in the hopes that we can use that to help
nail it down.

Thanks for having a look!
--
Jeff Layton <jlayton@xxxxxxxxxxxxxxx>

Attachment: pgpTcpQuyauby.pgp
Description: OpenPGP digital signature