Re: [PATCH v6 10/20] nfsd: add notification handlers for dir events

From: Jeff Layton

Date: Fri Jun 12 2026 - 14:38:31 EST


On Fri, 2026-06-12 at 13:51 -0400, Chuck Lever wrote:
> On Thu, Jun 11, 2026, at 1:50 PM, Jeff Layton wrote:
> > Add the necessary parts to accept a fsnotify callback for directory
> > change event and create a CB_NOTIFY request for it. When a dir nfsd_file
> > is created set a handle_event callback to handle the notification.
> >
> > Use that to allocate a nfsd_notify_event object and then hand off a
> > reference to each delegation's CB_NOTIFY. If anything fails along the
> > way, recall any affected delegations.
> >
> > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > ---
>
> > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > index ca4dd2f969eb..59378751d596 100644
> > --- a/fs/nfsd/nfs4callback.c
> > +++ b/fs/nfsd/nfs4callback.c
>
> > @@ -904,13 +908,45 @@ static void nfs4_xdr_enc_cb_notify(struct rpc_rqst *req,
> > encode_cb_sequence4args(xdr, cb, &hdr);
> >
> > /*
> > - * FIXME: get stateid and fh from delegation. Inline the cna_changes
> > - * buffer, and zero it.
> > + * nfsd4_cb_notify_prepare() sized the payload against a single page,
> > + * but did not account for the compound, sequence, stateid, and
> > + * filehandle encoded here. If the variable-length encode overflows the
> > + * backchannel send buffer, roll back to before the operation so that a
> > + * truncated CB_NOTIFY is never placed on the wire.
> > */
> > - xdrgen_encode_CB_NOTIFY4args(xdr, &args);
> > + start = xdr_stream_pos(xdr);
> > +
> > + p = xdr_reserve_space(xdr, 4);
> > + if (!p)
> > + goto out_err;
> > + *p = cpu_to_be32(OP_CB_NOTIFY);
>
> Please use xdr_stream_encode_u32 for this purpose.
>

Ok

>
> > +
> > + args.cna_stateid.seqid = dp->dl_stid.sc_stateid.si_generation;
> > + memcpy(&args.cna_stateid.other, &dp->dl_stid.sc_stateid.si_opaque,
> > + ARRAY_SIZE(args.cna_stateid.other));
> > + args.cna_fh.len = dp->dl_stid.sc_file->fi_fhandle.fh_size;
> > + args.cna_fh.data = dp->dl_stid.sc_file->fi_fhandle.fh_raw;
> > + args.cna_changes.count = ncn->ncn_nf_cnt;
> > + args.cna_changes.element = ncn->ncn_nf;
> > + if (!xdrgen_encode_CB_NOTIFY4args(xdr, &args))
> > + goto out_err;
> >
> > hdr.nops++;
> > encode_cb_nops(&hdr);
> > + return;
> > +
> > +out_err:
> > + /*
> > + * Drop the CB_NOTIFY op and emit a valid CB_SEQUENCE-only compound so
> > + * the client still advances its slot. Flag the failure so the done
> > + * handler recalls the delegation and the missed notification is not
> > + * silently lost. The flag is written here in the transmit path and read
> > + * in the done handler; the two are serialized phases of the same
> > + * rpc_task, so no additional barrier is needed.
> > + */
> > + ncn->ncn_encode_err = true;
>
> This flag is zeroed only once, at allocation time in alloc_init_dir_deleg().
> It is never cleared in nfsd4_cb_notify_prepare().
>
> Since nfsd4_cb_notify_release() can requeue the callback (via
> nfsd4_run_cb_notify) when events arrive while a callback is in flight,
> ->prepare may encode cleanly and return true, but nfsd4_cb_notify_done()
> still observes the stale ncn_encode_err == true and calls
> nfsd_break_one_deleg() -- discarding a good notification and recalling
> the delegation unnecessarily.
>

Ok, so we need to reset this in ->prepare.

>
> > + xdr_truncate_encode(xdr, start);
> > + encode_cb_nops(&hdr);
> > }
> >
> > static int nfs4_xdr_dec_cb_notify(struct rpc_rqst *rqstp,
>
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 0a15d7f3b543..513cbc1a583f 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
>
> > @@ -3471,19 +3472,146 @@ nfsd4_cb_getattr_release(struct nfsd4_callback *cb)
> > nfs4_put_stid(&dp->dl_stid);
> > }
> >
> > +static void nfsd_break_one_deleg(struct nfs4_delegation *dp)
> > +{
> > + bool queued;
> > +
> > + if (test_and_set_bit(NFSD4_CALLBACK_RUNNING, &dp->dl_recall.cb_flags))
> > + return;
> > +
> > + /*
> > + * We're assuming the state code never drops its reference
> > + * without first removing the lease. Since we're in this lease
> > + * callback (and since the lease code is serialized by the
> > + * flc_lock) we know the server hasn't removed the lease yet, and
> > + * we know it's safe to take a reference.
> > + */
> > + refcount_inc(&dp->dl_stid.sc_count);
> > + queued = nfsd4_run_cb(&dp->dl_recall);
> > + WARN_ON_ONCE(!queued);
> > + if (!queued) {
> > + refcount_dec(&dp->dl_stid.sc_count);
> > + clear_bit(NFSD4_CALLBACK_RUNNING, &dp->dl_recall.cb_flags);
> > + }
> > +}
>
> nfsd_break_one_deleg() does an unconditional
> refcount_inc(&dp->dl_stid.sc_count), and its comment justifies this
> with "the lease code is serialized by the flc_lock." That invariant
> holds when called from nfsd_break_deleg_cb() under flc_lock, but
> nfsd4_cb_notify_prepare() runs on a workqueue WITHOUT flc_lock. Its
> out_recall: path calls nfsd_break_one_deleg(dp)
> directly. The delegation can be concurrently destroyed with sc_count
> already at zero, making this an inc-from-zero.
>
> The dispatch path nfsd4_run_cb_notify already does this correctly with
> refcount_inc_not_zero. The out_recall path needs the same guard (skip
> the recall / bail if the refcount is already zero).
>
> I notice that the last unapplied patch ("nfsd: add
> support to CB_NOTIFY for dir attribute changes") rewrites the guard
> "if (count > NOTIFY4_EVENT_QUEUE_SIZE)" into "if (count > limit)" with
> limit = NOTIFY4_EVENT_QUEUE_SIZE - 1 when NOTIFY4_CHANGE_DIR_ATTRS is
> requested. That turns the previously-dead overflow branch into a live,
> routine path to out_recall, which adds another normal-operation route
> into this unlocked recall.
>

This wart has been there a long time, and we just papered over it with
the lock.

I think we need to do a refcount_inc_not_zero() in
nfsd_break_one_deleg() and just return without queuing the callback if
it's already at 0. That means that the recall is racing with the lease
teardown, so I think the right thing to do is to not send the recall in
that case.
--
Jeff Layton <jlayton@xxxxxxxxxx>