Re: [PATCH 2/2] net/9p: add a per-client fcall kmem_cache

From: Dominique Martinet
Date: Tue Jul 31 2018 - 00:17:33 EST


Matthew Wilcox wrote on Mon, Jul 30, 2018:
> On Mon, Jul 30, 2018 at 11:34:23AM +0200, Dominique Martinet wrote:
> > -static int p9_fcall_alloc(struct p9_fcall *fc, int alloc_msize)
> > +static int p9_fcall_alloc(struct p9_client *c, struct p9_fcall *fc,
> > + int alloc_msize)
> > {
> > - fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
> > + if (c->fcall_cache && alloc_msize == c->msize)
> > + fc->sdata = kmem_cache_alloc(c->fcall_cache, GFP_NOFS);
> > + else
> > + fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
>
> Could you simplify this by initialising c->msize to 0 and then this
> can simply be:
>
> > + if (alloc_msize == c->msize)
> ...

Hmm, this is rather tricky with the current flow of things;
p9_client_version() has multiple uses for that msize field.

Basically what happens is:
- init client struct, set clip msize to mount option/transport-specific
max
- p9_client_version() uses current c->msize to send a suggested value
to the server
- p9_client_rpc() uses current c->msize to allocate that first rpc,
this is pretty much hard-coded and will be quite intrusive to make an
exception for
- p9_client_version() looks at the msize the server suggested and clips
c->msize if the reply's is smaller than c->msize


I kind of agree it'd be nice to remove that check being done all the
time for just startup, but I don't see how to do this easily with the
current code.

Making p9_client_version take an extra argument would be easy but we'd
need to actually hardcode in p9_client_rpc that "if the message type is
TVERSION then use [page size or whatever] for allocation" and that kinds
of kills the point... The alternative being having p9_client_rpc takes
the actual size as argument itself but this once again is pretty
intrusive even if it could be done mechanically...

I'll think about this some more

> > +void p9_fcall_free(struct p9_client *c, struct p9_fcall *fc)
> > +{
> > + /* sdata can be NULL for interrupted requests in trans_rdma,
> > + * and kmem_cache_free does not do NULL-check for us
> > + */
> > + if (unlikely(!fc->sdata))
> > + return;
> > +
> > + if (c->fcall_cache && fc->capacity == c->msize)
> > + kmem_cache_free(c->fcall_cache, fc->sdata);
> > + else
> > + kfree(fc->sdata);
> > +}
>
> Is it possible for fcall_cache to be allocated before fcall_free is
> called? I'm concerned we might do this:
>
> allocate message A
> allocate message B
> receive response A
> allocate fcall_cache
> receive response B
>
> and then we'd call kmem_cache_free() for something allocated by kmalloc(),
> which works with slab and slub, but doesn't work with slob (alas).

Bleh, I checked this would work for slab and didn't really check
others..

This cannot happen right now because we only return the client struct
from p9_client_create after the first message is done (and, right now,
freed) but when we start adding refcounting to requests it'd be possible
to free the very first response after fcall_cache is allocated with a
"bad" server like syzcaller does sending the version reply before the
request came in.

I can't see any work-around around this other than storing how the fcall
was allocated in the struct itself though...
I guess I might as well do that now, unless you have a better idea.


> > @@ -980,6 +1000,9 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
> > if (err)
> > goto close_trans;
> >
> > + clnt->fcall_cache = kmem_cache_create("9p-fcall-cache", clnt->msize,
> > + 0, 0, NULL);
> > +
>
> If we have slab merging turned off, or we have two mounts from servers
> with different msizes, we'll end up with two slabs called 9p-fcall-cache.
> I'm OK with that, but are you?

Yeah, the reason I didn't make it global like p9_req_cache is precisely
to get two separate caches if the msizes are different.

I actually considered adding msize to the string with snprintf or
something but someone looking at it through slabinfo or similar will
have the sizes anyway so I don't think this would bring anything, do you
know if/think that tools will choke on multiple caches with the same
name?


I'm not sure about slab merging being disabled though, from the little I
understand I do not see why anyone would do that except for debugging,
and I'm fine with that.
Please let me know if I'm missing something though!


Thanks for the review,
--
Dominique Martinet