Re: [PATCH v8 02/10] kernfs: make ->attr.open RCU protected.

From: Imran Khan
Date: Mon Apr 25 2022 - 21:54:59 EST


Hello Tejun,

On 23/4/22 2:19 am, Tejun Heo wrote:
> On Sun, Apr 10, 2022 at 12:37:11PM +1000, Imran Khan wrote:
>> static int kernfs_seq_show(struct seq_file *sf, void *v)
>> {
>> struct kernfs_open_file *of = sf->private;
>> + struct kernfs_open_node *on = rcu_dereference_raw(of->kn->attr.open);
>
> I suppose raw deref is safe because @on can't go away while @of is alive,
> right?

Yes.

If that's the case, please factor out of -> on dereferencing into a
> helper and put a comment there explaining why the raw deref is safe.
>

Sure, will put dereferncing in a separate helper in next version.


>> @@ -201,7 +203,8 @@ static ssize_t kernfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
>> goto out_free;
>> }
>>
>> - of->event = atomic_read(&of->kn->attr.open->event);
>> + on = rcu_dereference_raw(of->kn->attr.open);
>
> cuz we don't wanna sprinkle raw derefs in multiple places without
> explanation like this.
>
Agree.

> ...
>> /**
>> @@ -566,24 +567,30 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
>> static void kernfs_put_open_node(struct kernfs_node *kn,
>> struct kernfs_open_file *of)
>> {
>> - struct kernfs_open_node *on = kn->attr.open;
>> - unsigned long flags;
>> + struct kernfs_open_node *on;
>> +
>> + /* ->attr.open NULL means there are no more open files */
>> + if (rcu_dereference_raw(kn->attr.open) == NULL)
>> + return;
>
> For pointer value check, what you want is rcu_access_pointer(). That said,
> tho, why is this being called if no one is linked on it? Before removing the
> refcnt, that'd be the same as trying to put a 0 ref. How does that happen?

Yeah this check surely should not be needed. I will remove it in next
version.

> Also, can you please rename it to unlink or something of the sort? It's
> confusing to call it put when there's no refcnt.
>

sure I will rename _put_ to _unlink_.

>>
>> mutex_lock(&kernfs_open_file_mutex);
>> - spin_lock_irqsave(&kernfs_open_node_lock, flags);
>> +
>> + on = rcu_dereference_protected(kn->attr.open,
>> + lockdep_is_held(&kernfs_open_file_mutex));
>
> Again, a better way to do it would be defining a kn -> on accessor which
> encodes the safe way to deref and use it. The deref rule is tied to the
> deref itself not the callsite.
>

Okay I will factor this out in a separate helper.

[...]

>> static int kernfs_fop_open(struct inode *inode, struct file *file)
>> @@ -765,12 +772,13 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
>> if (!(kn->flags & (KERNFS_HAS_MMAP | KERNFS_HAS_RELEASE)))
>> return;
>>
>> - on = kn->attr.open;
>> - if (!on)
>> + if (rcu_dereference_raw(kn->attr.open) == NULL)
>> return;
>
> rcu_access_pointer again and you gotta explain why the lockless check is
> safe.
>

The lockless check is safe because no one will be adding to ->attr.open
at this point of time. This allows early bail out if ->attr.open in
already NULL. And if kernfs_put_open_node makes ->attr.open NULL, it
does this under open_file_mutex so subsequent check under
open_file_mutex will make sure to bail out if kernfs_put_open_node won
the race.
I will put an explanatory comment in the code, explaining the same.

[...]

>> @@ -912,14 +920,13 @@ void kernfs_notify(struct kernfs_node *kn)
>> return;
>>
>> /* kick poll immediately */
>> - spin_lock_irqsave(&kernfs_open_node_lock, flags);
>> - on = kn->attr.open;
>> + rcu_read_lock();
>> + on = rcu_dereference(kn->attr.open);
>> if (on) {
>> atomic_inc(&on->event);
>> wake_up_interruptible(&on->poll);
>> }
>> - spin_unlock_irqrestore(&kernfs_open_node_lock, flags);
>> -
>> + rcu_read_unlock();
>
> An explanation of why this is safe in terms of event ordering would be great
> here.
>

This is safe because here we don't need to refcnt ->on in this case. If
writer (kernfs_put_open_node) has already made ->attr.open NULL we will
bail out. If kernfs_notify got an old ->attr.open we can still safely
process the event, even if kernfs_put_open_node updates ->attr.open to
NULL in parallel.
In both the cases the behaviour/order will be same as earlier code that
used kernfs_open_node_lock.
Please let me know if this answers your query or if something is still
missing.


Thanks
-- Imran