Re: [PATCH] firmware: wake all waiters

From: Luis R. Rodriguez
Date: Mon Jun 26 2017 - 19:32:05 EST


On Mon, Jun 26, 2017 at 02:44:17PM -0700, Linus Torvalds wrote:
> On Fri, Jun 23, 2017 at 4:37 PM, Jakub Kicinski
> <jakub.kicinski@xxxxxxxxxxxxx> wrote:
> > - swake_up(&fw_st->wq);
> > + swake_up_all(&fw_st->wq);
>
> Why is that code using the braindamaed "swait" model in the first place?

The conversion was done via commit 5b029624948d ("firmware: do not use fw_lock
for fw_state protection"). This commit overlooked this "batched request"
feature.

> That's the real problem here - swait() is a very specialized
> interface, and it does not make sense to use it here.
>
> Among all the simplifications it has is exactly the fact that it wakes
> up only one thing, because it is *so* specialized.

Not sure I follow, it can wake up all items in queue with swake_up_all(), no?

At first I *thought* we don't add further items with __prepare_to_swait() but
actually that if (list_empty(&wait->task_list)) check just checks if the
item was already added to a wait list. So swait can queue more waiters, no?

> But the *only* reason for swait is extreme memory issues and some very
> special realtime issues, where it saves a couple of bytes in
> structures that need close packing, and doesn't even use normal
> spinlocks, so it saves a couple of cycles at wakeup/sleep because it
> doesn't do a good job in general.

I see.

> The "avoid normal spinlocks" is because it is meant for code that is
> *so* special that it needs the magical low-level raw spinlocks.
>
> I really have *no* idea why the firmware code uses that idiotic
> special wait-queue. It has no reason to do so, except this comment
> from the commit that added it:
>
> "We use also swait instead of wait because don't need all the additional
> features wait provides."
>
> which is bogus, since it clearly just got the waiting wrong exactly
> *because* swait is pretty damn bad and specialized.

If indeed it cannot queue and wake all then surely this is buggered!

> I think the two valid users are RCU (which needed it for RT), and kvm
> (which also needed it for similar issues - it needs to be
> non-preemptible).

Got it.

> I don't see any similar reason for the firmware loading, and all it
> did was use an odd interface that resulted in this bug.

>From my review when this was suggested it seemed to cover all grounds
and an API that was more light weight seemed sensible. The swait documentation
seems to note "One would recommend using this wait queue where possible."
If its non suitable for multiple waits then indeed this needs fixing so
this doesn't happen again.

> Why is the firmware code being so damn odd on purpose?

I've been asking myself that since I started reviewing the firmware code :) But
in this case I believe the commit overlooked batched requests functionality and
to use swake_up_all().

Luis