Re: [RFC PATCH v4 05/13] workqueue, ktask: renice helper threads to prevent starvation

From: Tejun Heo
Date: Tue Nov 13 2018 - 11:34:08 EST


Hello, Daniel.

On Mon, Nov 05, 2018 at 11:55:50AM -0500, Daniel Jordan wrote:
> static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr,
> - bool from_cancel)
> + struct nice_work *nice_work, int flags)
> {
> struct worker *worker = NULL;
> struct worker_pool *pool;
> @@ -2868,11 +2926,19 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr,
> if (pwq) {
> if (unlikely(pwq->pool != pool))
> goto already_gone;
> +
> + /* not yet started, insert linked work before work */
> + if (unlikely(flags & WORK_FLUSH_AT_NICE))
> + insert_nice_work(pwq, nice_work, work);

So, I'm not sure this works that well. e.g. what if the work item is
waiting for other work items which are at lower priority? Also, in
this case, it'd be a lot simpler to simply dequeue the work item and
execute it synchronously.

> } else {
> worker = find_worker_executing_work(pool, work);
> if (!worker)
> goto already_gone;
> pwq = worker->current_pwq;
> + if (unlikely(flags & WORK_FLUSH_AT_NICE)) {
> + set_user_nice(worker->task, nice_work->nice);
> + worker->flags |= WORKER_NICED;
> + }
> }

I'm not sure about this. Can you see whether canceling & executing
synchronously is enough to address the latency regression?

Thanks.

--
tejun