Re: [RFC PATCH v3 08/15] perf workqueue: add queue_work and flush_workqueue functions

From: Riccardo Mancini
Date: Tue Aug 31 2021 - 12:23:35 EST


Hi,

On Tue, 2021-08-24 at 12:40 -0700, Namhyung Kim wrote:
> On Fri, Aug 20, 2021 at 3:54 AM Riccardo Mancini <rickyman7@xxxxxxxxx> wrote:
> >
> > This patch adds functions to queue and wait work_structs, and
> > related tests.
> >
> > When a new work item is added, the workqueue first checks if there
> > are threads to wake up. If so, it wakes it up with the given work item,
> > otherwise it will pick the next round-robin thread and queue the work
> > item to its queue. A thread which completes its queue will go to sleep.
> >
> > The round-robin mechanism is implemented through the next_worker
> > attibute which will point to the next worker to be chosen for queueing.
> > When work is assigned to that worker or when the worker goes to sleep,
> > the pointer is moved to the next worker in the busy_list, if any.
> > When a worker is woken up, it is added in the busy list just before the
> > next_worker, so that it will be chosen as last (it's just been assigned
> > a work item).
>
> Do we really need this?  I think some of the complexity comes
> because of this.  Can we simply put the works in a list in wq
> and workers take it out with a lock?  Then the kernel will
> distribute the works among the threads for us.
>
> Maybe we can get rid of worker->lock too..

Having a per-thread queue has some benefits which are very useful in our case:
- it should be able to scale to bigger machines than a shared queue (looking at
both tests from jiri, it looks like this version is somewhat better than v2, but
they're done in different conditions, so some other tests comparing the two
versions on big machines would be useful).
- it is possible to choose the worker to execute work on, which is used in the
evlist patchset (where threads can be pinned to a cpu and evlist operations can
be done on them).

Of course, it adds some complexity over the shared queue, for example:
- the next_worker pointer to implement the round-robin policy, for which maybe
there's a cleaner way to do it.
- the thread "self-registration", which I think can be dropped in favor of an
array inside the workqueue (the max number of threads is limited, so having a
self-registration does not really add much flexibility and it adds contention on
the workqueue lock when threads are spun-up). Getting rid of it could reduce
workqueue spinup and stop time.

Thanks,
Riccardo

>
> Thanks,
> Namhyung