Re: [PATCH v6 10/20] nfsd: add notification handlers for dir events

From: Chuck Lever

Date: Fri Jun 12 2026 - 13:52:34 EST



On Thu, Jun 11, 2026, at 1:50 PM, Jeff Layton wrote:
> Add the necessary parts to accept a fsnotify callback for directory
> change event and create a CB_NOTIFY request for it. When a dir nfsd_file
> is created set a handle_event callback to handle the notification.
>
> Use that to allocate a nfsd_notify_event object and then hand off a
> reference to each delegation's CB_NOTIFY. If anything fails along the
> way, recall any affected delegations.
>
> Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> ---

> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index ca4dd2f969eb..59378751d596 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c

> @@ -904,13 +908,45 @@ static void nfs4_xdr_enc_cb_notify(struct rpc_rqst *req,
> encode_cb_sequence4args(xdr, cb, &hdr);
>
> /*
> - * FIXME: get stateid and fh from delegation. Inline the cna_changes
> - * buffer, and zero it.
> + * nfsd4_cb_notify_prepare() sized the payload against a single page,
> + * but did not account for the compound, sequence, stateid, and
> + * filehandle encoded here. If the variable-length encode overflows the
> + * backchannel send buffer, roll back to before the operation so that a
> + * truncated CB_NOTIFY is never placed on the wire.
> */
> - xdrgen_encode_CB_NOTIFY4args(xdr, &args);
> + start = xdr_stream_pos(xdr);
> +
> + p = xdr_reserve_space(xdr, 4);
> + if (!p)
> + goto out_err;
> + *p = cpu_to_be32(OP_CB_NOTIFY);

Please use xdr_stream_encode_u32 for this purpose.


> +
> + args.cna_stateid.seqid = dp->dl_stid.sc_stateid.si_generation;
> + memcpy(&args.cna_stateid.other, &dp->dl_stid.sc_stateid.si_opaque,
> + ARRAY_SIZE(args.cna_stateid.other));
> + args.cna_fh.len = dp->dl_stid.sc_file->fi_fhandle.fh_size;
> + args.cna_fh.data = dp->dl_stid.sc_file->fi_fhandle.fh_raw;
> + args.cna_changes.count = ncn->ncn_nf_cnt;
> + args.cna_changes.element = ncn->ncn_nf;
> + if (!xdrgen_encode_CB_NOTIFY4args(xdr, &args))
> + goto out_err;
>
> hdr.nops++;
> encode_cb_nops(&hdr);
> + return;
> +
> +out_err:
> + /*
> + * Drop the CB_NOTIFY op and emit a valid CB_SEQUENCE-only compound so
> + * the client still advances its slot. Flag the failure so the done
> + * handler recalls the delegation and the missed notification is not
> + * silently lost. The flag is written here in the transmit path and read
> + * in the done handler; the two are serialized phases of the same
> + * rpc_task, so no additional barrier is needed.
> + */
> + ncn->ncn_encode_err = true;

This flag is zeroed only once, at allocation time in alloc_init_dir_deleg().
It is never cleared in nfsd4_cb_notify_prepare().

Since nfsd4_cb_notify_release() can requeue the callback (via
nfsd4_run_cb_notify) when events arrive while a callback is in flight,
->prepare may encode cleanly and return true, but nfsd4_cb_notify_done()
still observes the stale ncn_encode_err == true and calls
nfsd_break_one_deleg() -- discarding a good notification and recalling
the delegation unnecessarily.


> + xdr_truncate_encode(xdr, start);
> + encode_cb_nops(&hdr);
> }
>
> static int nfs4_xdr_dec_cb_notify(struct rpc_rqst *rqstp,

> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 0a15d7f3b543..513cbc1a583f 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c

> @@ -3471,19 +3472,146 @@ nfsd4_cb_getattr_release(struct nfsd4_callback *cb)
> nfs4_put_stid(&dp->dl_stid);
> }
>
> +static void nfsd_break_one_deleg(struct nfs4_delegation *dp)
> +{
> + bool queued;
> +
> + if (test_and_set_bit(NFSD4_CALLBACK_RUNNING, &dp->dl_recall.cb_flags))
> + return;
> +
> + /*
> + * We're assuming the state code never drops its reference
> + * without first removing the lease. Since we're in this lease
> + * callback (and since the lease code is serialized by the
> + * flc_lock) we know the server hasn't removed the lease yet, and
> + * we know it's safe to take a reference.
> + */
> + refcount_inc(&dp->dl_stid.sc_count);
> + queued = nfsd4_run_cb(&dp->dl_recall);
> + WARN_ON_ONCE(!queued);
> + if (!queued) {
> + refcount_dec(&dp->dl_stid.sc_count);
> + clear_bit(NFSD4_CALLBACK_RUNNING, &dp->dl_recall.cb_flags);
> + }
> +}

nfsd_break_one_deleg() does an unconditional
refcount_inc(&dp->dl_stid.sc_count), and its comment justifies this
with "the lease code is serialized by the flc_lock." That invariant
holds when called from nfsd_break_deleg_cb() under flc_lock, but
nfsd4_cb_notify_prepare() runs on a workqueue WITHOUT flc_lock. Its
out_recall: path calls nfsd_break_one_deleg(dp)
directly. The delegation can be concurrently destroyed with sc_count
already at zero, making this an inc-from-zero.

The dispatch path nfsd4_run_cb_notify already does this correctly with
refcount_inc_not_zero. The out_recall path needs the same guard (skip
the recall / bail if the refcount is already zero).

I notice that the last unapplied patch ("nfsd: add
support to CB_NOTIFY for dir attribute changes") rewrites the guard
"if (count > NOTIFY4_EVENT_QUEUE_SIZE)" into "if (count > limit)" with
limit = NOTIFY4_EVENT_QUEUE_SIZE - 1 when NOTIFY4_CHANGE_DIR_ATTRS is
requested. That turns the previously-dead overflow branch into a live,
routine path to out_recall, which adds another normal-operation route
into this unlocked recall.


--
Chuck Lever