Re: [PATCH] irq_bcm2836: Send event when onlining sleeping cores

From: Eric Anholt
Date: Wed May 10 2017 - 14:03:04 EST


Florian Fainelli <f.fainelli@xxxxxxxxx> writes:

> On 05/10/2017 03:31 AM, Phil Elwell wrote:
>> On 10/05/2017 11:09, Marc Zyngier wrote:
>>> On 10/05/17 10:05, Phil Elwell wrote:
>>>> On 10/05/2017 09:55, Marc Zyngier wrote:
>>>>> On Wed, May 10 2017 at 9:27:10 am BST, Phil Elwell <phil@xxxxxxxxxxxxxxx> wrote:
>>>>>> On 10/05/2017 08:42, Marc Zyngier wrote:
>>>>>>> On 09/05/17 20:02, Phil Elwell wrote:
>>>>>>>> On 09/05/2017 19:53, Marc Zyngier wrote:
>>>>>>>>> On 09/05/17 19:52, Phil Elwell wrote:
>>>>>>>>>> On 09/05/2017 19:14, Marc Zyngier wrote:
>>>>>>>>>>> On 09/05/17 19:08, Eric Anholt wrote:
>>>>>>>>>>>> Marc Zyngier <marc.zyngier@xxxxxxx> writes:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 09/05/17 17:59, Eric Anholt wrote:
>>>>>>>>>>>>>> Phil Elwell <phil@xxxxxxxxxxxxxxx> writes:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In order to reduce power consumption and bus traffic, it is sensible
>>>>>>>>>>>>>>> for secondary cores to enter a low-power idle state when waiting to
>>>>>>>>>>>>>>> be started. The wfe instruction causes a core to wait until an event
>>>>>>>>>>>>>>> or interrupt arrives before continuing to the next instruction.
>>>>>>>>>>>>>>> The sev instruction sends a wakeup event to the other cores, so call
>>>>>>>>>>>>>>> it from bcm2836_smp_boot_secondary, the function that wakes up the
>>>>>>>>>>>>>>> waiting cores during booting.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It is harmless to use this patch without the corresponding change
>>>>>>>>>>>>>>> adding wfe to the ARMv7/ARMv8-32 stubs, but if the stubs are updated
>>>>>>>>>>>>>>> and this patch is not applied then the other cores will sleep forever.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> See: https://github.com/raspberrypi/linux/issues/1989
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Signed-off-by: Phil Elwell <phil@xxxxxxxxxxxxxxx>
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>> drivers/irqchip/irq-bcm2836.c | 3 +++
>>>>>>>>>>>>>>> 1 file changed, 3 insertions(+)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/drivers/irqchip/irq-bcm2836.c b/drivers/irqchip/irq-bcm2836.c
>>>>>>>>>>>>>>> index e10597c..6dccdf9 100644
>>>>>>>>>>>>>>> --- a/drivers/irqchip/irq-bcm2836.c
>>>>>>>>>>>>>>> +++ b/drivers/irqchip/irq-bcm2836.c
>>>>>>>>>>>>>>> @@ -248,6 +248,9 @@ static int __init bcm2836_smp_boot_secondary(unsigned int cpu,
>>>>>>>>>>>>>>> writel(secondary_startup_phys,
>>>>>>>>>>>>>>> intc.base + LOCAL_MAILBOX3_SET0 + 16 * cpu);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> + dsb(sy); /* Ensure write has completed before waking the other CPUs */
>>>>>>>>>>>>>>> + sev();
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> return 0;
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is also the behavior that the standard arm64 spin-table
>>>>>>>>>>>>>> method has,
>>>>>>>>>>>>>> which we unfortunately can't quite use.
>>>>>>>>>>>>>
>>>>>>>>>>>>> And why is that so? Why do you have to reinvent the wheel (and hide the
>>>>>>>>>>>>> cloned wheel in an interrupt controller driver)?
>>>>>>>>>>>>>
>>>>>>>>>>>>> That doesn't seem right to me.
>>>>>>>>>>>>
>>>>>>>>>>>> The armv8 stubs (firmware-supplied code in the low page that do the
>>>>>>>>>>>> spinning) do actually implement arm64's spin-table method. It's the
>>>>>>>>>>>> armv7 stubs that use these registers in the irqchip instead of plain
>>>>>>>>>>>> addresses in system memory.
>>>>>>>>>>>
>>>>>>>>>>> Let's put ARMv7 aside for the time being. If your firmware already
>>>>>>>>>>> implements spin-tables, why don't you simply use that at least on arm64?
>>>>>>>>>>
>>>>>>>>>> We do.
>>>>>>>>>
>>>>>>>>> Obviously not the way it is intended if you have to duplicate the core
>>>>>>>>> architectural code in the interrupt controller driver, which couldn't
>>>>>>>>> care less.
>>>>>>>>
>>>>>>>> If we were using this method on arm64 then the other cores would not start up
>>>>>>>> because armstub8.S has always included a wfe. Nothing in the commit mentions
>>>>>>>> arm64 - this is an ARCH=arm fix.
>>>>>>>
>>>>>>> Thanks for the clarification, which you could have added to the commit
>>>>>>> message.
>>>>>>>
>>>>>>> The question still remains: why do we have CPU bring-up code in an
>>>>>>> interrupt controller, instead of having it in the architecture code?
>>>>>>>
>>>>>>> The RPi-2 is the *only* platform to have its SMP bringup code outside of
>>>>>>> arch/arm, so the first course of action would be to move that code where
>>>>>>> it belongs.
>>>>>>
>>>>>> You were CC'd on the commit (41f4988cc287e5f836d3f6620c9f900bc9b560e9) that
>>>>>> introduced bcm2836_smp_boot_secondary - it seems strange to start objecting
>>>>>> now.
>>>>>
>>>>> Well, I'm far from being perfect. If I had noticed it, I'd have NACKed
>>>>> it.
>>>>>
>>>>>> Yes, I think it is odd that it didn't go into arch/arm/mach-bcm, but in
>>>>>> the interests of making changes in small, independent steps, do you have a
>>>>>> problem with this commit?
>>>>>
>>>>> On its own, no. I'm just not keen on adding more unrelated stuff to this
>>>>> file, so let's start with dealing with the original bug, and you can
>>>>> then add this fix on top.
>>>>
>>>> That's an interesting use of the word "bug". From Wikipedia:
>>>>
>>>> "A software bug is an error, flaw, failure or fault in a computer program or
>>>> system that causes it to produce an incorrect or unexpected result, or to
>>>> behave in unintended ways."
>>>
>>> Whatever. Should I call it "pile of crap dumped in unsuitable locations"
>>> instead? What does Wikipedia says about it?
>>>
>>>> Although your concerns are valid, the faults you are objecting to are not causing
>>>> a malfunction of any kind. If we were to update the RPi firmware before this
>>>> patch was merged then upstream users would be left with one wheel on their wagon.
>>>
>>> And that'd be your problem, not mine. Look, you can argue around this
>>> all day, or you can fix this mess. Your choice.
>>
>> Is that the opinion of all here?
>
> The choice of word here got largely out of the original topic and I
> surely did eat a ton of popcorn here. There are two things that need
> fixing, and the time line and process for fixing these is clear:
>
> - your bugfix (Phil) is something that should be applied now, and
> backported to -stable trees once the fix hits the irqchip tree (or Linus')
>
> - relocating the code that does the secondary boot out of
> drivers/irqchip/ into arch/arm/mach-bcm/ needs to happen (Marc), and
> this is 4.13 material, there is no urgency in doing this *right now*,
> but it needs to happen
>
> Does that work for everyone?

Agreed. This patch, which we'll want to go to -stable, should clearly
go in first. Marc's patch can go in after, since it's not a -stable
candidate.

Thomas, could you add the cc to stable when picking this patch?

Attachment: signature.asc
Description: PGP signature