Re: [PATCH 02/10] nfsd: drain callbacks and clear cl_cb_session

From: Chuck Lever

Date: Fri May 29 2026 - 11:45:28 EST




On Thu, May 28, 2026, at 5:55 PM, Jeff Layton wrote:
> From: Chris Mason <clm@xxxxxxxx>
>
> After a DESTROY_SESSION the per-session teardown path can free a
> session while rpciod still holds an inflight callback rpc_task that
> dereferences clp->cl_cb_session. nfsd4_probe_callback_sync() flushes
> cl_callback_wq, but once nfsd4_run_cb_work() has called
> rpc_call_async() the rpc_task lives on rpciod; flushing the workqueue
> does not wait for it. After the flush returns,
> nfsd4_destroy_session() proceeds through nfsd4_put_session_locked()
> and free_session() kfree()s the slab while rpciod's
> nfsd4_cb_sequence_done(), grab_slot(), and nfsd41_cb_release_slot()
> are still dereferencing cb->cb_clp->cl_cb_session.
>
> destroy path rpciod
> ------------ ------
> unhash_session(ses)
> nfsd4_probe_callback_sync(clp)
> flush_workqueue(cl_callback_wq)
> /* returns; rpc_task still live */
> nfsd4_put_session_locked(ses)
> free_session(ses) -> kfree(ses)
> nfsd4_cb_sequence_done()
> reads cb_clp->cl_cb_session
> /* freed slab */
>
> A second window exists in nfsd4_process_cb_update(). When
> __nfsd4_find_backchannel() returns NULL because unhash_session() has
> already removed the destroyed session from cl_sessions,
> setup_callback_client() takes the v4.1 early return
>
> if (!conn->cb_xprt || !ses)
> return -EINVAL;
>
> so clp->cl_cb_session = ses never fires and the field retains a
> pointer to the about-to-be-freed session. Symmetrically, if a later
> probe finds a different session's backchannel conn and that
> setup_callback_client() call fails, the error tail must still scrub
> any previously published cl_cb_session.
>
> Fix by mirroring the two-stage drain that nfsd4_shutdown_callback()
> already performs: call nfsd41_cb_inflight_wait_complete() in
> nfsd4_probe_callback_sync() after flush_workqueue() so rpciod-side
> nfsd41_cb_inflight_end() decrements are observed before the caller
> releases the final session reference. The two direct callers,
> nfsd4_destroy_session() and nfsd4_init_conn() (itself invoked from
> nfsd4_create_session() and nfsd4_bind_conn_to_session()), run in
> sleepable process context and tolerate the wait_var_event() sleep:
>
> nfsd4_destroy_session() (fs/nfsd/nfs4state.c):
> unhash_session(ses);
> spin_unlock(&nn->client_lock); /* spinlock dropped */
> nfsd4_probe_callback_sync(ses->se_client);
>
> nfsd4_init_conn() (fs/nfsd/nfs4state.c):
> acquires no locks in its body; calls nfsd4_hash_conn(),
> nfsd4_register_conn(), then nfsd4_probe_callback_sync() --
> entirely in sleepable process context.
>
> Also clear clp->cl_cb_session unconditionally on the
> nfsd4_process_cb_update() error return so every
> setup_callback_client() failure -- whether c is NULL or points at a
> different session whose probe failed -- leaves the field NULL rather
> than pointing at a session that may subsequently be freed.
>
> Fixes: dcbeaa68dbbd ("nfsd4: allow backchannel recovery")
> Assisted-by: kres:claude-opus-4-7
> Signed-off-by: Chris Mason <clm@xxxxxxxx>
> ---
> fs/nfsd/nfs4callback.c | 21 +++++++++++++++++----
> 1 file changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 1964a213f80e..1cf6b6100357 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -1205,9 +1205,8 @@ static int setup_callback_client(struct
> nfs4_client *clp, struct nfs4_cb_conn *c
> } else {
> if (!conn->cb_xprt || !ses)
> return -EINVAL;
> - clp->cl_cb_session = ses;
> args.bc_xprt = conn->cb_xprt;
> - args.prognumber = clp->cl_cb_session->se_cb_prog;
> + args.prognumber = ses->se_cb_prog;
> args.protocol = conn->cb_xprt->xpt_class->xcl_ident |
> XPRT_TRANSPORT_BC;
> args.authflavor = ses->se_cb_sec.flavor;
> @@ -1225,8 +1224,10 @@ static int setup_callback_client(struct
> nfs4_client *clp, struct nfs4_cb_conn *c
> return -ENOMEM;
> }
>
> - if (clp->cl_minorversion != 0)
> + if (clp->cl_minorversion != 0) {
> clp->cl_cb_conn.cb_xprt = conn->cb_xprt;
> + clp->cl_cb_session = ses;
> + }
> clp->cl_cb_client = client;
> clp->cl_cb_cred = cred;
> rcu_read_lock();
> @@ -1299,6 +1300,7 @@ void nfsd4_probe_callback_sync(struct nfs4_client *clp)
> {
> nfsd4_probe_callback(clp);
> flush_workqueue(clp->cl_callback_wq);
> + nfsd41_cb_inflight_wait_complete(clp);
> }
>
> void nfsd4_change_callback(struct nfs4_client *clp, struct
> nfs4_cb_conn *conn)
> @@ -1679,7 +1681,17 @@ static struct nfsd4_conn *
> __nfsd4_find_backchannel(struct nfs4_client *clp)
> * Note there isn't a lot of locking in this code; instead we depend on
> * the fact that it is run from clp->cl_callback_wq, which won't run
> two
> * work items at once. So, for example, clp->cl_callback_wq handles
> all
> - * access of cl_cb_client and all calls to rpc_create or
> rpc_shutdown_client.
> + * access of cl_cb_client and cl_cb_session, and all calls to
> rpc_create
> + * or rpc_shutdown_client.
> + *
> + * rpciod-side readers of cl_cb_session (encode_cb_sequence4args(),
> + * nfsd4_cb_sequence_done(), the cb-slot helpers, and the cb_sequence
> + * tracepoints) run outside cl_callback_wq. The
> + * nfsd41_cb_inflight_wait_complete() drain in
> nfsd4_probe_callback_sync()
> + * waits until cl_cb_inflight reaches zero before the caller proceeds
> with
> + * session teardown; any rpc_task that reads cl_cb_session must hold an
> + * inflight pin (via nfsd41_cb_inflight_begin) for this fence to be
> + * effective.
> */
> static void nfsd4_process_cb_update(struct nfsd4_callback *cb)
> {
> @@ -1731,6 +1743,7 @@ static void nfsd4_process_cb_update(struct
> nfsd4_callback *cb)
> nfsd4_mark_cb_down(clp);
> if (c)
> svc_xprt_put(c->cn_xprt);
> + clp->cl_cb_session = NULL;
> return;
> }
> }
>
> --
> 2.54.0

Several NFSD callback done handlers retry indefinitely on
NFS4ERR_DELAY via rpc_delay(), so a client that keeps
replying DELAY leaves this per-client counter nonzero and
blocks the foreground CREATE/BIND/DESTROY_SESSION request
even though the callback no longer references the session
being torn down.

Although partly due to the way callbacks are structured
currently, this patch potentially introduces a client-
controlled DoS vector.


--
Chuck Lever