Re: [PATCH] btrfs: fix a potential racy

From: David Sterba
Date: Wed Apr 22 2020 - 17:46:13 EST


On Sat, Apr 18, 2020 at 08:59:07PM -0500, wu000273@xxxxxxx wrote:
> From: Qiushi Wu <wu000273@xxxxxxx>
>
> In function reada_find_extent and reada_extent_put, kref_get(&zone->refcnt)
> are not called in a lock context. Potential racy may happen. It's possible
> that thread1 decreases the kref to 0, and thread2 increases the kref to 1,
> then thread1 releases the pointer.

There are several things I don't see or understand why they woudl be a
problem.

kref_get does not need to be take under locks in case it's not the first
reference or if it's clear that eg. the caller has taken a reference and
it'll never go to 0.

The references are supposed to be lightweight so taking the locks per
kref_get does not follow a good pattern.

The kref type is a wrapper around refcount_t, that detects if there's an
increment from 0->1, similar to what you suggest that could happen. It
might be tricky to actually hit that case so I'm not saying that it's
all ok, just that we haven't seen that so far.

Your description of the race needs to be expanded as it's too terse,
where exactly the increments could happen, how would be the calls in the
two cpus interleaved, like

cpu1 cpu2

reada_find_extent()
kref_get
...
reada_find_extent()
kref_put (last put, refs == 0)
kref_get (inc from zero)
...

Then it's easer to reason about the actual races and the context where
it could happen. Eg. btrfs_reada_add adds one reference from the
beginning, so the final put does not happen in the middle of
reada_find_extent unless something would be seriously broken and that
would manifest very clearly.