Re: [RFC/REFACT] Refactoring and significantly reducing code complexity

From: Daniel Jordan
Date: Wed Oct 25 2023 - 14:08:49 EST


Hello,

On Thu, Sep 28, 2023 at 04:53:38PM +0800, Wang Jinchao wrote:
> This is a refactored version with the following main changes:

The RFC overall is a nice simplification, and the basic approach of using an
ordered workqueue seems to work.

> - The parallel workqueue no longer uses the WQ_UNBOUND attribute

What's the justification here? If it improves performance, please show
numbers. Earlier tests[0] showed a large improvement when adding this
flag.

[0] https://lore.kernel.org/linux-crypto/20190906014029.3345-1-daniel.m.jordan@xxxxxxxxxx/

> - Removal of CPU-related logic, sysfs-related interfaces

I agree with Steffen that we should continue to honor the cpumasks that the
user sets.

The simplest way I see to make the parallel mask work with your refactor is to
just make the parallel workqueue unbound again, since setting workqueue
attributes is only allowed for unbound, and bring back some of the plumbing
that leads to the apply_workqueue_attrs call.

The serial mask is trickier. Changing attributes of an ordered workqueue (the
cpumask in this case) makes the kernel throw a warning...

static int apply_workqueue_attrs_locked
...
/* creating multiple pwqs breaks ordering guarantee */
if (!list_empty(&wq->pwqs)) {
if (WARN_ON(wq->flags & __WQ_ORDERED_EXPLICIT))
return -EINVAL;

wq->flags &= ~__WQ_ORDERED;
}

...but I'm not sure this is a fundamental limitation. The changelog of
0a94efb5acbb ("workqueue: implicit ordered attribute should be overridable")
says changes to "max_active and some attribute changes" are rejected, but it
might be possible to relax the warning to allow setting a cpumask while still
rejecting other changes.

> Testing was conducted using ltp's pcrypt_aead01, and the execution time
> comparison between the old and new versions is as follows:
>
> Old Version:
> real 0m27.451s
> user 0m0.031s
> sys 0m0.260s
>
> New Version:
> real 0m21.351s
> user 0m0.023s
> sys 0m0.260s

Great speedup. A test that runs many requests for a long time in parallel is
also good to run, such as [0].

> @@ -986,57 +281,27 @@ struct padata_instance *padata_alloc(const char *name)
...
> + pinst->serial_wq = alloc_ordered_workqueue ("%s_serial",
> + WQ_MEM_RECLAIM | WQ_FREEZABLE,
> + name);

Why add these two WQ_ flags? Also, whitespace is kinda funky.