Re: [PATCH 3/5] padata: grab parallel_data refcnt for reorder

From: Daniel Jordan
Date: Fri Oct 28 2022 - 12:06:03 EST


On Wed, Oct 19, 2022 at 10:37:06AM +0200, Nicolai Stange wrote:
> On entry of padata_do_serial(), the in-flight padata_priv owns a reference
> to the associated parallel_data instance.
>
> However, as soon as the padata_priv got enqueued on the reorder list, it
> can be completed from a different context, causing the reference to get
> released in the course.
>
> This would potentially cause UAFs from the subsequent padata_reorder()
> operations invoked from the enqueueing padata_do_serial() or from the
> reorder work.
>
> Note that this is a purely theroretical concern, the problem has never been
> actually observed -- it would require multiple pcrypt request submissions
> racing against each other, ultimately a pcrypt instance destruction
> (DELALG) short after request completions as well as unfortunate timing.
>
> However, for the sake of correctness, it is still worth fixing.
>
> Make padata_do_serial() grab a reference count on the parallel_data for
> the subsequent reorder operation(s). As long as the padata_priv has not
> been enqueued, this is safe, because as mentioned above, in-flight
> pdata_privs own a reference already.
>
> Note that padata_reorder() might schedule another padata_reorder() work
> and thus, care must be taken to not prematurely release that "reorder
> refcount" from padata_do_serial() again in case that has happened.
> Make padata_reorder() return a bool for indicating whether or not a
> reorder work has been scheduled. Let padata_do_serial() drop its refcount
> only if this is not the case. Accordingly, make the reorder work handler,
> invoke_padata_reorder(), drop it then as appropriate.
>
> A remark on the commit chosen for the Fixes tag reference below: before
> commit bbefa1dd6a6d ("crypto: pcrypt - Avoid deadlock by using per-instance
> padata queues"), the padata_parallel lifetime had been tied to the
> padata_instance. The padata_free() resp. padata_stop() issued a
> synchronize_rcu() before padata_free_pd() from the instance destruction
> path, rendering UAFs from the padata_do_serial()=>padata_reorder()
> invocations with BHs disabled impossible AFAICS. With that, the
> padata_reorder() work remains to be considered. Before
> commit b128a3040935 ("padata: allocate workqueue internally"), the
> workqueue got destroyed (from pcrypt), hence drained, before the padata
> instance destruction, but this change moved that to after the
> padata_free_pd() invocation from __padata_free(). So, while the Fixes
> reference from below is most likely technically correct, I would still like
> to reiterate that this problem is probably hard to trigger in practice,
> even more so before commit bbefa1dd6a6d ("crypto: pcrypt - Avoid deadlock
> by using per-instance padata queues").
>
> Fixes: b128a3040935 ("padata: allocate workqueue internally")
> Signed-off-by: Nicolai Stange <nstange@xxxxxxx>
> ---
> kernel/padata.c | 26 +++++++++++++++++++++-----
> 1 file changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/padata.c b/kernel/padata.c
> index 0bf8c80dad5a..b79226727ef7 100644
> --- a/kernel/padata.c
> +++ b/kernel/padata.c
> @@ -275,7 +275,7 @@ static struct padata_priv *padata_find_next(struct parallel_data *pd,
> return padata;
> }
>
> -static void padata_reorder(struct parallel_data *pd)
> +static bool padata_reorder(struct parallel_data *pd)
> {
> struct padata_instance *pinst = pd->ps->pinst;
> int cb_cpu;
> @@ -294,7 +294,7 @@ static void padata_reorder(struct parallel_data *pd)
> * care for all the objects enqueued during the holdtime of the lock.
> */
> if (!spin_trylock_bh(&pd->lock))
> - return;
> + return false;
>
> while (1) {
> padata = padata_find_next(pd, true);
> @@ -331,17 +331,23 @@ static void padata_reorder(struct parallel_data *pd)
>
> reorder = per_cpu_ptr(pd->reorder_list, pd->cpu);
> if (!list_empty(&reorder->list) && padata_find_next(pd, false))
> - queue_work(pinst->serial_wq, &pd->reorder_work);
> + return queue_work(pinst->serial_wq, &pd->reorder_work);
> +
> + return false;
> }
>
> static void invoke_padata_reorder(struct work_struct *work)
> {
> struct parallel_data *pd;
> + bool keep_refcnt;
>
> local_bh_disable();
> pd = container_of(work, struct parallel_data, reorder_work);
> - padata_reorder(pd);
> + keep_refcnt = padata_reorder(pd);
> local_bh_enable();
> +
> + if (!keep_refcnt)
> + padata_put_pd(pd);
> }
>
> static void padata_serial_worker(struct work_struct *serial_work)
> @@ -392,6 +398,15 @@ void padata_do_serial(struct padata_priv *padata)
> struct padata_list *reorder = per_cpu_ptr(pd->reorder_list, hashed_cpu);
> struct padata_priv *cur;
>
> + /*
> + * The in-flight padata owns a reference on pd. However, as
> + * soon as it's been enqueued on the reorder list, another
> + * task can dequeue and complete it, thereby dropping the
> + * reference. Grab another reference here, it will eventually
> + * be released from a reorder work, if any, or below.
> + */
> + padata_get_pd(pd);
> +
> spin_lock(&reorder->lock);
> /* Sort in ascending order of sequence number. */
> list_for_each_entry_reverse(cur, &reorder->list, list)
> @@ -407,7 +422,8 @@ void padata_do_serial(struct padata_priv *padata)
> */
> smp_mb();
>
> - padata_reorder(pd);
> + if (!padata_reorder(pd))
> + padata_put_pd(pd);

do_serial is supposed to be called with BHs disabled and will be in
every case after a fix for a separate issue that I plan to send this
cycle. Given that it (will soon...) always happen under RCU protection,
part of this issue could be addressed like this, which puts the expense
of dealing with this rare problem in the slow path:

diff --git a/kernel/padata.c b/kernel/padata.c
index 0bf8c80dad5a..cd6740ae6629 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -1110,6 +1110,12 @@ void padata_free_shell(struct padata_shell *ps)
if (!ps)
return;

+ /*
+ * Wait for all _do_serial calls to finish to avoid touching freed pd's
+ * and ps's.
+ */
+ synchronize_rcu();
+
mutex_lock(&ps->pinst->lock);
list_del(&ps->list);
padata_put_pd(rcu_dereference_protected(ps->pd, 1));

pcrypt calls padata_free_shell() when all outstanding transforms (and
thus requests, I think) have been freed/completed, so no new task can
come into padata_reorder. synchronize_rcu() then flushes out any
remaining _reorder calls.

This doesn't deal with pending reorder_work items, but we can fix it
later in the series.

What do you think?