Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq

From: Peter Zijlstra
Date: Fri Aug 07 2015 - 06:58:52 EST


On Wed, Feb 25, 2015 at 10:02:50PM +0100, Sebastian Andrzej Siewior wrote:

> >+static inline int swait_active(struct swait_queue_head *q)
> >+{
> >+ return !list_empty(&q->task_list);
>
> In RT there was a smp_mb() which you dropped and I assume you had
> reasons for it.

Yeah, RT didn't have a reason for the smp_mb() -- any barrier without a
comment is a bug :-)

Also waitqueue_active(), its counterpart, does not have a barrier there
either.

Nor did I see any reason for that mb to be there.

> I assumed that one can perform list_empty_careful()
> without a lock if the items were removed with list_del_init(). But since
> nothing in -RT blow up so far I guess this here is legal, too :)

Nobody will come and arrest us for software bugs -- yet ;-)

> >+/*
> >+ * The thing about the wake_up_state() return value; I think we can ignore it.
> >+ *
> >+ * If for some reason it would return 0, that means the previously waiting
> >+ * task is already running, so it will observe condition true (or has already).
> >+ */
> >+void swake_up_locked(struct swait_queue_head *q)
> >+{
> >+ struct swait_queue *curr;
> >+
> >+ list_for_each_entry(curr, &q->task_list, task_list) {
> >+ wake_up_process(curr->task);
>
> okay. So since we limit everything to TASK_NORMAL which has to sleep
> while on the list there is no need to check if we actually woken up
> someone.

Partly that, also that I don't see how that return value is meaningful
in the first place.

If it were to return false, the task was/is already running and it will
observe whatever condition we just satisfied to allow waking things up.

So either way around, we'll get (at least) 1 task running.

> >+ list_del_init(&curr->task_list);
> >+ break;
> >+ }
> >+}
> >+EXPORT_SYMBOL(swake_up_locked);



> >+void swake_up(struct swait_queue_head *q)
> >+{
> >+ unsigned long flags;
> >+
> >+ if (!swait_active(q))
> >+ return;
> >+
> >+ raw_spin_lock_irqsave(&q->lock, flags);
> >+ __swake_up_locked(q);
>
> I thing this should have been swake_up_locked() instead since
> __swake_up_locked() isn't part of this patch.
>
> Just a nitpick: later there is __prepare_to_swait() and __finish_swait()
> which have the __ prefix instead a _locked suffix. Not sure what is
> better for a better for a public API but maybe one way would be good.

Yeah, I suppose that's true ;-)

> >+ raw_spin_unlock_irqrestore(&q->lock, flags);
> >+}
> >+EXPORT_SYMBOL(swake_up);
> >+
> >+/*
> >+ * Does not allow usage from IRQ disabled, since we must be able to
> >+ * release IRQs to guarantee bounded hold time.
> >+ */
> >+void swake_up_all(struct swait_queue_head *q)
> >+{
> >+ struct swait_queue *curr, *next;
> >+ LIST_HEAD(tmp);
>
> WARN_ON(irqs_disabled()) ?

Lockdep should already catch that by virtue of using unconditional _irq
spinlock primitives.

> >+ if (!swait_active(q))
> >+ return;
> >+
> >+ raw_spin_lock_irq(&q->lock);
> >+ list_splice_init(&q->task_list, &tmp);
> >+ while (!list_empty(&tmp)) {
> >+ curr = list_first_entry(&tmp, typeof(curr), task_list);
> >+
> >+ wake_up_state(curr->task, state);
> >+ list_del_init(&curr->task_list);
>
> So because the task may timeout and remove itself from the list at
> anytime you need to hold the lock during wakeup and the removal from the
> list

Indeed.

> >+
> >+ if (list_empty(&tmp))
> >+ break;
> >+
> >+ raw_spin_unlock_irq(&q->lock);
>
> and you drop the lock after each iteration in case there is an IRQ
> pending or the task, that has been just woken up, has a higher priority
> than the current task and needs to get on the CPU.

> Not sure if this case matters:
> - _this_ task (wake_all) prio 120
> - first task in queue prio 10, RR
> - second task in queue prio 9, RR

Why complicate things? Better to not assume anything and just do the
simple correct thing.

> the *old* behavior would put the second task before the first task on
> CPU. The *new* behaviour puts the first task on the CPU after dropping
> the lock. The second task (that has a higher priority but nobody knows)
> has to wait until the first one is done (and anything else that might
> been woken up in the meantime with a higher prio than 120).

Irrelevant, we _must_ drop the lock in order to maintain bounded
behaviour.

> >+ raw_spin_lock_irq(&q->lock);
> >+ }
> >+ raw_spin_unlock_irq(&q->lock);
> >+}
> >+EXPORT_SYMBOL(swake_up_all);

> >+void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait)
> this one has no users the __ suggests that it is locked edition. Maybe
> it is for the completionsâ

Yeah, who knows, I certainly do not anymore ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/