Re: [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs

From: Daniel Jordan
Date: Fri Jul 12 2019 - 12:10:28 EST


On Fri, Jul 12, 2019 at 12:10:12PM +0200, Steffen Klassert wrote:
> On Fri, Jul 12, 2019 at 06:06:36PM +0800, Herbert Xu wrote:
> > Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> wrote:
> > >
> > > CPU0 CPU1
> > >
> > > padata_reorder padata_do_serial
> > > LOAD reorder_objects // 0
> > > INC reorder_objects // 1
> > > padata_reorder
> > > TRYLOCK pd->lock // failed
> > > UNLOCK pd->lock
> >
> > I think this can't happen because CPU1 won't call padata_reorder
> > at all as it's the wrong CPU so it gets pushed back onto a work
> > queue which will go back to CPU0.

Thanks for looking at this.

When you say the wrong CPU, I think you're referring to the reorder_via_wq
logic in padata_do_serial. If so, I think my explanation was unclear, because
there were two padata jobs in flight and my tracepoints showed neither of them
punted padata_reorder to a workqueue. Let me expand on this, hopefully it
helps.

pcrypt used CPU 5 for its callback CPU. The first job in question, with
sequence number 16581, hashed to CPU 21 on my system. This is a more complete
version of what led to the hanging modprobe command.

modprobe (CPU2) kworker/21:1-293 (CPU21) kworker/5:2-276 (CPU5)
-------------------------- ------------------------ ----------------------
<submit job, seq_nr=16581>
...
padata_do_parallel
queue_work_on(21, ...)
<sleeps>
padata_parallel_worker
pcrypt_aead_dec
padata_do_serial
padata_reorder
| padata_get_next // returns job, seq_nr=16581
| // serialize seq_nr=16581
| queue_work_on(5, ...)
| padata_get_next // returns -EINPROGRESS
// padata_reorder doesn't return yet!
| | padata_serial_worker
| | pcrypt_aead_serial
| | <wake up modprobe>
| | <worker finishes>
<submit job, seq_nr=16582> | |
... | |
padata_do_parallel | |
queue_work_on(22, ...) | | (LOAD reorder_objects as 0 somewhere
<sleeps> | | in here, before the INC)
| | kworker/22:1-291 (CPU22)
| | ------------------------
| | padata_parallel_worker
| | pcrypt_aead_dec
| | padata_do_serial
| | // INC reorder_objects to 1
| | padata_reorder
| | // trylock fail, CPU21 _should_
| | // serialize 16582 but doesn't
| | <worker finishes>
| // deletes pd->timer
// padata_reorder returns
<worker finishes>
<keeps on sleeping lazily>

My tracepoints showed CPU22 increased reorder_objects to 1 but CPU21 read it as
0.

I think adding the full barrier guarantees the following ordering, and the
memory model people can correct me if I'm wrong:

CPU21 CPU22
------------------------ --------------------------
UNLOCK pd->lock
smp_mb()
LOAD reorder_objects
INC reorder_objects
spin_unlock(&pqueue->reorder.lock) // release barrier
TRYLOCK pd->lock

So if CPU22 has incremented reorder_objects but CPU21 reads 0 for it, CPU21
should also have unlocked pd->lock so CPU22 can get it and serialize any
remaining jobs.