Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()

From: Frank Rowand
Date: Wed Jun 12 2019 - 13:08:06 EST


On 6/12/19 7:21 AM, Greg Kroah-Hartman wrote:
> On Wed, Jun 12, 2019 at 07:53:39AM -0600, Rob Herring wrote:
>> On Tue, Jun 11, 2019 at 3:52 PM Sandeep Patil <sspatil@xxxxxxxxxxx> wrote:
>>>
>>> On Tue, Jun 11, 2019 at 01:56:25PM -0700, 'Saravana Kannan' via kernel-team wrote:
>>>> On Tue, Jun 11, 2019 at 8:18 AM Frank Rowand <frowand.list@xxxxxxxxx> wrote:
>>>>>
>>>>> Hi Saravana,
>>>>>
>>>>> On 6/10/19 10:36 AM, Rob Herring wrote:
>>>>>> Why are you resending this rather than replying to Frank's last
>>>>>> comments on the original?
>>>>>
>>>>> Adding on a different aspect... The independent replies from three different
>>>>> maintainers (Rob, Mark, myself) pointed out architectural issues with the
>>>>> patch series. There were also some implementation issues brought out.
>>>>> (Although I refrained from bringing up most of my implementation issues
>>>>> as they are not relevant until architecture issues are resolved.)
>>>>
>>>> Right, I'm not too worried about the implementation issues before we
>>>> settle on the architectural issues. Those are easy to fix.
>>>>
>>>> Honestly, the main points that the maintainers raised are:
>>>> 1) This is a configuration property and not describing the device.
>>>> Just use the implicit dependencies coming from existing bindings.
>>>>
>>>> I gave a bunch of reasons for why I think it isn't an OS configuration
>>>> property. But even if that's not something the maintainers can agree
>>>> to, I gave a concrete example (cyclic dependencies between clock
>>>> provider hardware) where the implicit dependencies would prevent one
>>>> of the devices from probing till the end of time. So even if the
>>>> maintainers don't agree we should always look at "depends-on" to
>>>> decide the dependencies, we still need some means to override the
>>>> implicit dependencies where they don't match the real dependency. Can
>>>> we use depends-on as an override when the implicit dependencies aren't
>>>> correct?
>>>>
>>>> 2) This doesn't need to be solved because this is just optimizing
>>>> probing or saving power ("we should get rid of this auto disabling"):
>>>>
>>>> I explained why this patch series is not just about optimizing probe
>>>> ordering or saving power. And why we can't ignore auto disabling
>>>> (because it's more than just auto disabling). The kernel is currently
>>>> broken when trying to use modules in ARM SoCs (probably in other
>>>> systems/archs too, but I can't speak for those).
>>>>
>>>> 3) Concerns about backwards compatibility
>>>>
>>>> I pointed out why the current scheme (depends-on being the only source
>>>> of dependency) doesn't break compatibility. And if we go with
>>>> "depends-on" as an override what we could do to keep backwards
>>>> compatibility. Happy to hear more thoughts or discuss options.
>>>>
>>>> 4) How the "sync_state" would work for a device that supplies multiple
>>>> functionalities but a limited driver.
>>>
>>> <snip>
>>> To be clear, all of above are _real_ problems that stops us from efficiently
>>> load device drivers as modules for Android.
>>>
>>> So, if 'depends-on' doesn't seem like the right approach and "going back to
>>> the drawing board" is the ask, could you please point us in the right
>>> direction?
>>
>> Use the dependencies which are already there in DT. That's clocks,
>> pinctrl, regulators, interrupts, gpio at a minimum. I'm simply not
>> going to accept duplicating all those dependencies in DT. The downside
>> for the kernel is you have to address these one by one and can't have
>> a generic property the driver core code can parse. After that's in
>> place, then maybe we can consider handling any additional dependencies
>> not already captured in DT. Once all that is in place, we can probably
>> sort device and/or driver lists to optimize the probe order (maybe the
>> driver core already does that now?).
>>
>> Get rid of the auto disabling of clocks and regulators in
>> late_initcall. It's simply not a valid marker that boot is done when
>> modules are involved. We probably can't get rid of it as lot's of
>> platforms rely on that, so it will have to be opt out. Make it the
>> platform's responsibility for ensuring a consistent state.
>>
>> Perhaps we need a 'boot done' or 'stop deferring probe' trigger from
>> userspace in order to make progress if dependencies are missing.
>
> People have tried to do this multiple times, and you never really know
> when "boot is done" due to busses that have discoverable devices and
> async probing of other busses.
>
> You do know "something" when you pivot to a new boot disk, and when you
> try to load init, but given initramfs and the fact that modules are
> usually included on them, that's not really a good indication that
> anything is "finished".
>
> I don't want userspace to be responsible for telling the kernel, "hey
> you should be finished now!", as that's an async notification that is
> going to be ripe for problems.
>
> I really like the "depends-on" information, as it shows a topology that
> DT doesn't seem to be able to show today, yet we rely on it in the
> kernel with the whole deferred probing mess. To me, there doesn't seem
> to be any other way to properly "know" this.

Rob pointed out (above) the dependencies that are already contained in
devicetree. Those are real hardware descriptions that have to be
correct or the devices will not use the hardware that they "depend on"
(thus an incorrect hardware description _also_ results in the resource
that is not described not actually being a dependency).

If you add a second set of properties purely to list dependencies then
the second set of dependencies is likely to have latent errors that are
not initially detected (when the property is first introduced) because
the proper resources just happen to be available by sheer chance. Then
a later change in boot order resulting from a code change or configuration
change can expose that latent error.

A second set of dependency properties is making boot more fragile.

-Frank

>
>> Or
>> maybe just some timeout would be sufficient. I think this is probably
>> more useful for development than in a shipping product. Even if you
>> could fallback to polling mode instead of interrupts for example, I
>> doubt you would want to in a product.
>
> timeouts suck. And do not work for shipping products. I want a device
> with 100 modules that relys on DT to be able to boot just as fast as a
> laptop with 100 modules that has all of the needed dependancies
> described today in their bus topologies, because they used sane hardware
> (i.e. PCI and ACPI). Why hurt embedded people just because their
> hardware relies on a system where you have to loop for long periods of
> time because DT can not show the topology correctly?
>
>> You should also keep in mind that everything needed for a console has
>> to be built in. Maybe Android can say the console isn't needed, but in
>> general we can't.
>
> What does a console have to do with any of this?
>
> confused,
>
> greg k-h
>