Re: [PATCH] irq_bcm2836: Send event when onlining sleeping cores

From: Marc Zyngier
Date: Wed May 10 2017 - 13:15:15 EST


On 10/05/17 17:21, Florian Fainelli wrote:
> On 05/10/2017 03:31 AM, Phil Elwell wrote:
>> On 10/05/2017 11:09, Marc Zyngier wrote:
>>> On 10/05/17 10:05, Phil Elwell wrote:
>>>> On 10/05/2017 09:55, Marc Zyngier wrote:
>>>>> On Wed, May 10 2017 at 9:27:10 am BST, Phil Elwell <phil@xxxxxxxxxxxxxxx> wrote:
>>>>>> On 10/05/2017 08:42, Marc Zyngier wrote:
>>>>>>> On 09/05/17 20:02, Phil Elwell wrote:
>>>>>>>> On 09/05/2017 19:53, Marc Zyngier wrote:
>>>>>>>>> On 09/05/17 19:52, Phil Elwell wrote:
>>>>>>>>>> On 09/05/2017 19:14, Marc Zyngier wrote:
>>>>>>>>>>> On 09/05/17 19:08, Eric Anholt wrote:
>>>>>>>>>>>> Marc Zyngier <marc.zyngier@xxxxxxx> writes:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 09/05/17 17:59, Eric Anholt wrote:
>>>>>>>>>>>>>> Phil Elwell <phil@xxxxxxxxxxxxxxx> writes:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In order to reduce power consumption and bus traffic, it is sensible
>>>>>>>>>>>>>>> for secondary cores to enter a low-power idle state when waiting to
>>>>>>>>>>>>>>> be started. The wfe instruction causes a core to wait until an event
>>>>>>>>>>>>>>> or interrupt arrives before continuing to the next instruction.
>>>>>>>>>>>>>>> The sev instruction sends a wakeup event to the other cores, so call
>>>>>>>>>>>>>>> it from bcm2836_smp_boot_secondary, the function that wakes up the
>>>>>>>>>>>>>>> waiting cores during booting.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It is harmless to use this patch without the corresponding change
>>>>>>>>>>>>>>> adding wfe to the ARMv7/ARMv8-32 stubs, but if the stubs are updated
>>>>>>>>>>>>>>> and this patch is not applied then the other cores will sleep forever.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> See: https://github.com/raspberrypi/linux/issues/1989
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Signed-off-by: Phil Elwell <phil@xxxxxxxxxxxxxxx>
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>> drivers/irqchip/irq-bcm2836.c | 3 +++
>>>>>>>>>>>>>>> 1 file changed, 3 insertions(+)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/drivers/irqchip/irq-bcm2836.c b/drivers/irqchip/irq-bcm2836.c
>>>>>>>>>>>>>>> index e10597c..6dccdf9 100644
>>>>>>>>>>>>>>> --- a/drivers/irqchip/irq-bcm2836.c
>>>>>>>>>>>>>>> +++ b/drivers/irqchip/irq-bcm2836.c
>>>>>>>>>>>>>>> @@ -248,6 +248,9 @@ static int __init bcm2836_smp_boot_secondary(unsigned int cpu,
>>>>>>>>>>>>>>> writel(secondary_startup_phys,
>>>>>>>>>>>>>>> intc.base + LOCAL_MAILBOX3_SET0 + 16 * cpu);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> + dsb(sy); /* Ensure write has completed before waking the other CPUs */
>>>>>>>>>>>>>>> + sev();
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> return 0;
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is also the behavior that the standard arm64 spin-table
>>>>>>>>>>>>>> method has,
>>>>>>>>>>>>>> which we unfortunately can't quite use.
>>>>>>>>>>>>>
>>>>>>>>>>>>> And why is that so? Why do you have to reinvent the wheel (and hide the
>>>>>>>>>>>>> cloned wheel in an interrupt controller driver)?
>>>>>>>>>>>>>
>>>>>>>>>>>>> That doesn't seem right to me.
>>>>>>>>>>>>
>>>>>>>>>>>> The armv8 stubs (firmware-supplied code in the low page that do the
>>>>>>>>>>>> spinning) do actually implement arm64's spin-table method. It's the
>>>>>>>>>>>> armv7 stubs that use these registers in the irqchip instead of plain
>>>>>>>>>>>> addresses in system memory.
>>>>>>>>>>>
>>>>>>>>>>> Let's put ARMv7 aside for the time being. If your firmware already
>>>>>>>>>>> implements spin-tables, why don't you simply use that at least on arm64?
>>>>>>>>>>
>>>>>>>>>> We do.
>>>>>>>>>
>>>>>>>>> Obviously not the way it is intended if you have to duplicate the core
>>>>>>>>> architectural code in the interrupt controller driver, which couldn't
>>>>>>>>> care less.
>>>>>>>>
>>>>>>>> If we were using this method on arm64 then the other cores would not start up
>>>>>>>> because armstub8.S has always included a wfe. Nothing in the commit mentions
>>>>>>>> arm64 - this is an ARCH=arm fix.
>>>>>>>
>>>>>>> Thanks for the clarification, which you could have added to the commit
>>>>>>> message.
>>>>>>>
>>>>>>> The question still remains: why do we have CPU bring-up code in an
>>>>>>> interrupt controller, instead of having it in the architecture code?
>>>>>>>
>>>>>>> The RPi-2 is the *only* platform to have its SMP bringup code outside of
>>>>>>> arch/arm, so the first course of action would be to move that code where
>>>>>>> it belongs.
>>>>>>
>>>>>> You were CC'd on the commit (41f4988cc287e5f836d3f6620c9f900bc9b560e9) that
>>>>>> introduced bcm2836_smp_boot_secondary - it seems strange to start objecting
>>>>>> now.
>>>>>
>>>>> Well, I'm far from being perfect. If I had noticed it, I'd have NACKed
>>>>> it.
>>>>>
>>>>>> Yes, I think it is odd that it didn't go into arch/arm/mach-bcm, but in
>>>>>> the interests of making changes in small, independent steps, do you have a
>>>>>> problem with this commit?
>>>>>
>>>>> On its own, no. I'm just not keen on adding more unrelated stuff to this
>>>>> file, so let's start with dealing with the original bug, and you can
>>>>> then add this fix on top.
>>>>
>>>> That's an interesting use of the word "bug". From Wikipedia:
>>>>
>>>> "A software bug is an error, flaw, failure or fault in a computer program or
>>>> system that causes it to produce an incorrect or unexpected result, or to
>>>> behave in unintended ways."
>>>
>>> Whatever. Should I call it "pile of crap dumped in unsuitable locations"
>>> instead? What does Wikipedia says about it?
>>>
>>>> Although your concerns are valid, the faults you are objecting to are not causing
>>>> a malfunction of any kind. If we were to update the RPi firmware before this
>>>> patch was merged then upstream users would be left with one wheel on their wagon.
>>>
>>> And that'd be your problem, not mine. Look, you can argue around this
>>> all day, or you can fix this mess. Your choice.
>>
>> Is that the opinion of all here?
>
> The choice of word here got largely out of the original topic and I
> surely did eat a ton of popcorn here. There are two things that need

Always happy to entertain.

> fixing, and the time line and process for fixing these is clear:
>
> - your bugfix (Phil) is something that should be applied now, and
> backported to -stable trees once the fix hits the irqchip tree (or Linus')

Why? As far as I can tell, none of the current platforms are affected,
since the corresponding firmware hasn't been distributed. So by the
letter of this, this patch is 4.13 material too.

> - relocating the code that does the secondary boot out of
> drivers/irqchip/ into arch/arm/mach-bcm/ needs to happen (Marc), and
> this is 4.13 material, there is no urgency in doing this *right now*,
> but it needs to happen
>
> Does that work for everyone?

I'm strongly opposed to it. Moving code over to arch/arm/ has zero risk,
and removes crap from the existing code. Hell, make the whole thing one
patch if that makes it easier for you to backport. But piling more crap
there? No, thank you.

M.
--
Jazz is not dead. It just smells funny...