Re: [RFC PATCH] 9p: forbid use of mempool for TFLUSH

From: Christian Schoenebeck
Date: Thu Jul 14 2022 - 15:16:26 EST


On Mittwoch, 13. Juli 2022 06:17:01 CEST Dominique Martinet wrote:
> TFLUSH is called while the thread still holds memory for the
> request we're trying to flush, so mempool alloc can deadlock
> there. With p9_msg_buf_size() rework the flush allocation is
> small so just make it fail if allocation failed; all that does
> is potentially leak the request we're flushing until its reply
> finally does come.. or if it never does until umount.
>
> Signed-off-by: Dominique Martinet <asmadeus@xxxxxxxxxxxxx>
> ---

Patch looks fine on first impression, but I'll postpone testing this. And yes,
I also think that exempting Tflush should be fair. If 4k (soon) cannot be
allocated, then you probably have worse problems than that.

> Here's a concrete version of what I had in mind: literally just make
> allocation fail if the initial alloc failed.
>
> I can't reproduce any bad hang with a sane server here, but we still
> risk hanging with a bad server that ignores flushes as these are still
> unkillable (someday I'll finish my async requests work...)
>
> So ultimately there are two things I'm not so happy about with mempools:
> - this real possibility of client hangs if a server mishandles some
> replies -- this might make fuzzing difficult in particular, I think it's

Concrete example of such a mishap?

> easier to deal with failed IO (as long as it fails all the way back to
> userspace) than to hang forever.
> I'm sure there are others who prefer to wait instead, but I think this
> should at least have a timeout or something.

Not sure if it was easy to pick an appropriate timeout value. I've seen things
slowing down extremely with 9p after a while. But to be fair, these were on
production machines with ancient kernel versions, so maybe already fixed.

A proc interface would be useful though to be able to identify things like
piling up too many fids and other performance related numbers.

> - One of the reasons I wanted to drop the old request cache before is
> that these caches are per mount/connection. If you have a dozen of
> mounts that each cache 4 requests worth as here, with msize=1MB and two
> buffers per request we're locking down 8 * 12 = 96 MB of ram just for
> mounting.
> That being said, as long as hanging is a real risk I'm not comfortable
> sharing the mempools between all the clients either, so I'm not sure
> what to suggest.

Why would a shared mempool increase the chance of a hang or worsen its
outcome?

> Anyway, we're getting close to the next merge request and I don't have
> time to look deeper into this; I'll be putting the mempool patches on
> hold for next cycle at least and we can discuss again if Kent still
> encounters allocation troubles with Christian's smaller buffers
> optimization first.
> These will very likely get in this cycle unless something bad happens,
> I've finished retesting a bit without trouble here.
>
>
> net/9p/client.c | 23 +++++++++++++++++++----
> 1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 928c9f88f204..f9c17fb79f35 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -218,7 +218,7 @@ static int parse_opts(char *opts, struct p9_client
> *clnt) return ret;
> }
>
> -static void p9_fcall_init(struct p9_client *c, struct p9_fcall *fc,
> +static int p9_fcall_init(struct p9_client *c, struct p9_fcall *fc,
> int fc_idx, unsigned alloc_msize)
> {
> gfp_t gfp = GFP_NOFS|__GFP_NOWARN;
> @@ -232,11 +232,13 @@ static void p9_fcall_init(struct p9_client *c, struct
> p9_fcall *fc, if (alloc_msize < c->msize)
> fc->sdata = kmalloc(alloc_msize, gfp);
>
> - if (!fc->sdata) {
> + if (!fc->sdata && fc_idx >= 0) {
> fc->sdata = mempool_alloc(&c->pools[fc_idx], gfp);
> fc->used_mempool = true;
> fc->capacity = c->msize;
> }
> +
> + return fc->sdata == NULL;
> }
>
> void p9_fcall_fini(struct p9_client *c, struct p9_fcall *fc,
> @@ -280,6 +282,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, uint
> t_size, uint r_size, struct p9_req_t *req = kmem_cache_alloc(p9_req_cache,
> GFP_NOFS); int alloc_tsize;
> int alloc_rsize;
> + int fc_idx = 0;
> int tag;
> va_list apc;
>
> @@ -294,8 +297,19 @@ p9_tag_alloc(struct p9_client *c, int8_t type, uint
> t_size, uint r_size, if (!req)
> return ERR_PTR(-ENOMEM);
>
> - p9_fcall_init(c, &req->tc, 0, alloc_tsize);
> - p9_fcall_init(c, &req->rc, 1, alloc_rsize);
> + /* We cannot use the mempool for TFLUSH because flushes can be
> + * allocated when the thread still holds memory for the request
> + * we're flushing. A negative fc_idx will make p9_fcall_init
> + * error out.
> + */
> + if (type == P9_TFLUSH) {
> + fc_idx = -2;
> + }
> +
> + if (p9_fcall_init(c, &req->tc, fc_idx, alloc_tsize))
> + goto free_req;
> + if (p9_fcall_init(c, &req->rc, fc_idx + 1, alloc_rsize))
> + goto free;
>
> p9pdu_reset(&req->tc);
> p9pdu_reset(&req->rc);
> @@ -334,6 +348,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, uint
> t_size, uint r_size, free:
> p9_fcall_fini(c, &req->tc, 0);
> p9_fcall_fini(c, &req->rc, 1);
> +free_req:
> kmem_cache_free(p9_req_cache, req);
> return ERR_PTR(-ENOMEM);
> }