RE: [PATCH 02/30] habanalabs: fix race when waiting on encaps signal

From: Dani Liberman
Date: Mon Jan 24 2022 - 13:22:47 EST




On Sun, 23 Jan 2022 02:27 +0200 Hillf Danton wrote:
> On Sat, 22 Jan 2022 21:57:03 +0200 Oded Gabbay wrote:
> > @@ -2063,13 +2063,22 @@ static int cs_ioctl_signal_wait(struct hl_fpriv
> *hpriv, enum hl_cs_type cs_type,
> > idp = &ctx->sig_mgr.handles;
> > idr_for_each_entry(idp, encaps_sig_hdl, id) {
> > if (encaps_sig_hdl->cs_seq == signal_seq) {
> > - handle_found = true;
> > /* get refcount to protect removing
> > * this handle from idr, needed when
> > * multiple wait cs are used with offset
> > * to wait on reserved encaps signals.
> > */
> > kref_get(&encaps_sig_hdl->refcount);
> > + /*
> > + * Since kref_put of this handle is executed outside the
> > + * current lock, it is possible that the handle refcount
> > + * is 0 but it yet to be removed from the list. In this case
> > + * need to consider the handle as not valid. To ensure
> > + * that the handle is valid, its refcount must be bigger
> > + * than 1.
> > + */
> > + if (kref_read(&encaps_sig_hdl->refcount) > 1)
> > + handle_found = true;
> > break;
> > }
> > }
> > --
> > 2.25.1
>
> Wonder why kref_get_unless_zero() does not fit here, given the chance
> for bumping zero refcount up?
>
> Hillf

Thanks, you are right.
I will send an updated patch with your suggestion.

Dani