Re: [PATCH 0/3] Fix dt-validate issues on qemu dtbdumps due to dt-bindings

From: Conor.Dooley
Date: Tue Aug 16 2022 - 18:54:00 EST


Hey Drew,
Thanks for piping up.

On 16/08/2022 15:06, Andrew Jones wrote:
> [You don't often get email from ajones@xxxxxxxxxxxxxxxx. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> On Mon, Aug 15, 2022 at 07:18:02PM +0000, Conor.Dooley@xxxxxxxxxxxxx wrote:
>> Any takers on trashing my regex? Otherwise I'll just submit
>> a v2 with the regex and it can be shat on there instead :)
>>
>> On 09/08/2022 19:36, Conor Dooley wrote:
>>> On 09/08/2022 15:14, Rob Herring wrote:
>>>> On Mon, Aug 08, 2022 at 10:01:11PM +0000, Conor.Dooley@xxxxxxxxxxxxx wrote:
>>>>> On 08/08/2022 22:34, Jessica Clarke wrote:
>>>>>> On Fri, Aug 05, 2022 at 05:28:42PM +0100, Conor Dooley wrote:
>>>>>>> From: Conor Dooley <conor.dooley@xxxxxxxxxxxxx>
>>>>>>> The final patch adds some new ISA strings
>>>>>>> which needs scruitiny from someone with more knowledge about what ISA
>>>>>>> extension strings should be reported in a dt than I have.
>>>>>>
>>>>>> Listing every possible ISA string supported by the Linux kernel really
>>>>>> is not going to scale...
>>>>
>>>> How does the kernel scale? (No need to answer)
>>>>
>>>>> Yeah, totally correct there. Case for adding a regex I suppose, but I
>>>>> am not sure how to go about handling the multi-letter extensions or
>>>>> if parsing them is required from a binding compliance point of view.
>>>>> Hoping for some input from Palmer really.
>>>>
>>>> Yeah, looks like a regex pattern is needed.
>>>
>>> I started pottering away at this but I have arrived at:
>>> rv64imaf?d?c?h?(_z[imafdqcbvkh]([a-z])*)*$
>
> Don't forget the ^ at the start.
>
> Do we need to worry about optional major and minor version numbers?
> Or check that Z names have at least one character following the category
> character? Actually, the first letter after Z being a category is only a
> convention. Maybe we don't want to enforce that. What about X extensions?

For the character after Z, I think we could operate on the assumption
that that's a convention until things change. The regex isnt set in
stone forever.
With x, it becomes - which to me makes bad worse:
^rv64imaf?d?q?c?b?v?k?h?(?:(?:_z[imafdqcbvkh]|_x)(?:[a-z])*)*$

and then for the version numbers it becomes completely awful.
I'd argue that if we are going to support those, then we should
do that as another regex. We are already forcing lower case in
these ISA strings - is there an actual benefit in adding the
numbers, or might we want to "encourage" removing those too?

I hope I am missing something, as my regex foo isn't that good, to
enforce the ordering & the numbers - even for the simple case of the
major number only, we'd need to convert "f?" to "(?:f\d+)?" and so
on for every single extension. I don't think we reduce that either
as we want to enforce the ordering.

For the minor versions it goes to "(?:f\d+p\d+)?". At that point I
don't think we are adding any value but w/e, who am I to decide.
That ballooned out to 194 characters for me. I then decided to have
a bit of fun, and just do both number sets as a oneliner, using
some named match groups. That was about 255 characters. 😍
Anyway, dt-schema had a panic attack at something I was doing
so I think that /may/ be a bad idea.

I vote for allow the x extensions, keep the convention for standard
extensions & revisit this in the future if needed...

Thanks,
Conor.


>
> Thanks,
> drew
>
>>>
>>> I suspect that before "h?" there should be more single letter
>>> extensions added for completeness sake. So then it'd bloat out to:
>>> rv64imaf?d?q?c?b?v?k?h?(_z[imafdqcbvkh]([a-z])*)*$
>>>
>>> I checked a couple different "bad" isa strings against it and
>>> nothing went up in flames but my regex skills are far from great
>>> so I'm sure there's better ways to represent this.
>>>
>>> Anyways, this pattern is based on my understanding that:
>>> - the single letter order is fixed & we don't care about things that
>>> can't even do "ima"
>>> - the multi letter extensions are all in a "_z<foo>" format where the
>>> first letter of <foo> is a valid single letter extension
>>> - we don't care about the e extension from an OS PoV (this could be a
>>> very flawed take...)
>>> - after the first two chars, the extension name could be an english
>>> word (ifencei anyone?) so it's not worth restricting the charset
>>> - that attempting to validate the contents of the multiletter extensions
>>> with dt-validate beyond the formatting is a futile, massively verbose
>>> or unwieldy exercise at best
>>>
>>> Some or all of those assumptions could be very very wrong so if {someone,
>>> anyone} wants to correct me - feel ***more*** than free..
>>>
>>> Thanks,
>>> Conor.
>>>
>>> patch would then look like:
>>>
>>> diff --git a/Documentation/devicetree/bindings/riscv/cpus.yaml b/Documentation/devicetree/bindings/riscv/cpus.yaml
>>> index d632ac76532e..1e54e7746190 100644
>>> --- a/Documentation/devicetree/bindings/riscv/cpus.yaml
>>> +++ b/Documentation/devicetree/bindings/riscv/cpus.yaml
>>> @@ -74,9 +74,7 @@ properties:
>>> insensitive, letters in the riscv,isa string must be all
>>> lowercase to simplify parsing.
>>> $ref: "/schemas/types.yaml#/definitions/string"
>>> - enum:
>>> - - rv64imac
>>> - - rv64imafdc
>>> + pattern: rv64imaf?d?q?c?b?v?k?h?(_z[imafdqcbvkh]([a-z])*)*$
>>>
>>> # RISC-V requires 'timebase-frequency' in /cpus, so disallow it here
>>> timebase-frequency: false
>>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/linux-riscv