Re: [RFC] yamldt v0.5, now a DTS compiler too

From: Pantelis Antoniou
Date: Sun Oct 22 2017 - 14:23:43 EST


Hi Grant,

> On Oct 22, 2017, at 19:54 , Grant Likely <grant.likely@xxxxxxxxxxxx> wrote:
>
> On Fri, Oct 20, 2017 at 8:16 PM, Pantelis Antoniou
> <pantelis.antoniou@xxxxxxxxxxxx> wrote:
>> Hi Grant,
>>
>>> On Oct 20, 2017, at 20:46 , Grant Likely <grant.likely@xxxxxxxxxxxx> wrote:
>>>
>>> On Thu, Sep 28, 2017 at 8:58 PM, Pantelis Antoniou
>>> <pantelis.antoniou@xxxxxxxxxxxx> wrote:
>>>> Hello again,
>>>>
>>>> Significant progress has been made on yamldt and is now capable of
>>>> not only generating yaml from DTS source but also compiling DTS sources
>>>> and being almost fully compatible with DTC.
>>>>
>>>> Compiling the kernel's DTBs using yamldt is as simple as using a
>>>> DTC=yamldt.
>>>>
>>>> Error reporting is accurate and validation against a YAML based schema
>>>> works as well. In a short while I will begin posting patches with
>>>> fixes on bindings and DTS files in the kernel.
>>>>
>>>> Please try it on your platform and report if you encounter any problems.
>>>>
>>>> https://github.com/pantoniou/yamldt
>>>>
>>>> I am eagerly awaiting for your comments.
>>>
>>> Hi Pantelis,
>>>
>>> This is good work. I've played around with it and I'm looking forward
>>> to chatting next week.
>>>
>>
>> Thanks. Iâm looking forward to it too.
>>
>>> One thing I've done is tried loading the output YAML files into
>>> another YAML interpreter and the current encoding causes problems.
>>> Specifically, in yamldt anchors/aliases are being used as a
>>> replacement for labels/phandles, but that conflicts with the YAML data
>>> model which defines a reference as a way to make a copy of the data
>>> appear in another part of the tree. For example, for the following
>>> snippit:
>>>
>>> intc: intc@10000 {
>>> #interrupt-cells = <1>;
>>> compatible = "acme,intc";
>>> reg = <0x10000 0x1000>;
>>> gpio-controller;
>>> };
>>>
>>> serial@20000 {
>>> compatible = "acme,uart";
>>> reg = <0x20000 0x1000>;
>>> interrupt-parent = <&intc>;
>>> interrupts = <5>;
>>> };
>>>
>>> yamldt will encode this as:
>>>
>>> intc@10000: &intc
>>> "#interrupt-cells": 1
>>> compatible: acme,intc
>>> reg: [0x10000, 0x1000]
>>> gpio-controller:
>>>
>>> serial@20000:
>>> compatible: acme,uart
>>> reg: [0x20000, 0x1000]
>>> interrupt-parent: *intc
>>> interrupts: 5
>>>
>>> But, the expected behaviour for a YAML parser is expand the alias
>>> '*intc' which results in the following structure:
>>>
>>> intc@10000: &intc
>>> "#interrupt-cells": 1
>>> compatible: acme,intc
>>> reg: [0x10000, 0x1000]
>>> gpio-controller:
>>>
>>> serial@20000:
>>> compatible: acme,uart
>>> reg: [0x20000, 0x1000]
>>> interrupt-parent:
>>> "#interrupt-cells": 1
>>> compatible: acme,intc
>>> reg: [0x10000, 0x1000]
>>> gpio-controller:
>>> interrupts: 5
>>>
>>> See? It results in the entire interrupt controller node to appear as
>>> an instance under the interrupt-parent property, when the intention is
>>> only to create a phandle. yamldt should not redefine the behaviour of
>>> '*' aliases. Instead, it should use a different indicator, either
>>> using an explicit !phandle tag, or by replacing '*' with something
>>> else. I worked around it in my tests by replacing '*' with '$â.
>>
>> Yes. This is expected. I donât think pure YAML form is a good match for all
>> the crazy things that are possible (and actually used in practice) with DTS.
>
> I donât think it is as dire as that. The DTS structure is not complex
> and I think can easily be mapped into pure YAML. But, it does require
> treating the DTS structure separately from its semantic meaning. For
> example, the grammar of nodes, properties and labels easily maps to
> pure YAML, but phandles and overlay trees have semantic meaning that
> must be resolved by DT specific code. Iâll respond to youâre specific
> examples belowâ
>

We are in complete agreement here. Single nodes and properties map to YAML perfectly.
Itâs the complex way that we build the complete DTB that stress the YAML structures.

>>
>> For instance thereâs no way a normal YAML parser wonât choke with something like
>>
>> / {
>> foo;
>> };
>>
>> / {
>> bar;
>> };
>>
>> Which is a common idiom in DTS files.
>
> Thatâs only true when the top level of nodes is encoded as a map, but
> it shouldnât be. It should be a list instead, for two reasons. First,
> order matters for the top level. Subsequent top level trees have to be
> applied in order to fully resolve the tree, but a map doesnât preserve
> the ordering. Second, as you rightly point out, the same name can be
> used for multiple top level trees. So, the encoding should be a list
> with each list entry containing the node path {/path,&ref}, and the
> node data. One possible way to do this is with a tuple:
>
> treedata:
> - - /
> - foo:
> - - /
> - bar:
>
> Another option is to add a special property in the node to contain the name:
>
> treedata:
> - $path: /
> foo:
> - $path: /
> bar:
>
> Personally, I prefer the special property approach. That would also be
> a good way to encode labels
>

The YAML bare sequence is problematic; Iâll explain below.

>> The decision to use the YAML anchors and references was more borne out of a desire
>> to sort of match conceptually the labels and references of DTS. Itâs not a big
>> issue to switch to something different.
>
> I think it would be a good idea. One reason for defining a YAML
> encoding is to be able to use existing tools to work with the data.
> Changing the meaning of anchors and aliases breaks those tools
> immediately.
>

Hmm, I see what you mean. Semantically this mapping is more correct but it does
present problems to non-aware tools.

>> If we were to force YAML/DT file to be regular YAML files parseable by everything
>> weâd have to rethink a lot of conventions that DT files currently use and Iâm afraid
>> a machine translation from DTS to YAML might not be feasible then.
>>
>> OTOH moving to that model might make it possible to use YAML constructs that are not
>> mapped at all to DTS. For instance
>>
>> - foo: true
>> bar: âstringâ
>>
>> - foo: false
>> bar: âanother-stringâ
>>
>> is not possible to be mapped to a DT node/property structure right now.
>
> Iâm not following. Are you saying DT has no way to encode a list of
> nodes? What use case are you imagining?
>

This threw me off too at first.

The problem is when you try to convert this into a live tree structure.

The YAML encoding is

+SEQ
+MAP
=VAL :foo
=VAL :true
=VAL :bar
=VAL "string
-MAP
+MAP
=VAL :foo
=VAL :false
=VAL :bar
=VAL "another-string
-MAP
-SEQ

So itâs a sequence of MAPs.

In a live tree DTS form would be

noname-property = { foo=true; bar = âstringâ; }, { foo = false; bar=âanother-stringâ; };

We donât have the capability as right now to express something like this.

Namely properties containing nodes, and the root of the live tree not being a node.

>>
>>>
>>
>>> Plus, it would be useful to use normal YAML anchors/aliases for
>>> creating node templates. For example:
>>>
>>> serial-template: &acme-uart . # The anchor for the template
>>> compatible: acme,uart
>>> interrupt-parent: *intc
>>>
>>> root:
>>> serial@20000:
>>> <<: *acme-uart # Alias node merged into serial@20000
>>> interrupts: 5
>>> reg: [0x20000, 0x1000]
>>> serial@30000:
>>> <<: *acme-uart # Alias node merged into serial@30000
>>> interrupts: 5
>>> reg: [0x30000, 0x1000]
>>>
>>
>> Yes, Iâm aware of this and I planned to talk to you about it :)
>> It can be a very powerful way to cut down the churn of DT right now.
>>
>> Itâs not going to be a problem for yamldt to support it (perhaps
>> adding a new editing tags to allow more fine grained control of the
>> substitions).
>>
>>> Another problem with anchors/references is YAML seems to require the
>>> anchor to be defined before the reference, or at least that's what
>>> pyyaml and ruamel both expect. Regardless, The chosen YAML encoding
>>> should be readily consumable by existing yaml implementations without
>>> having to do a lot of customization.
>>>
>>
>> I think this is only true for streamable YAML implementations; yamldt
>> can leave references unresolved until the emit phase (or even leave them
>> be as object files).
>
> it is possible (probable) that Iâm just not familiar with how to
> change that behaviour in pyyaml and ruamel. However, the point is moot
> if we can agree to preserve pure YAML semantics.
>

Agreed.

>>> I'm slightly concerned about using & anchors for labels because it
>>> seems only one anchor can be defined per node, but DTC allows multiple
>>> labels for a single node. This might not be an issue in practice
>>> though. Another implementation issue related to using & anchors is the
>>> YAML spec defines them as an encoding artifact, and parsers can
>>> discard the anchor names after parsing the YAML structure, which is a
>>> problem if we use something like $name to reference an anchor. The
>>> solution might just be that labels need to go into a special property
>>> so they don't disappear from the data stream.
>>>
>>
>> I do support multiple labels to the same node in yamldt with the
>> following trick, that uses an empty tree.
>>
>> foo: bar: baz { frob; };
>>
>> bar: &foo
>> frob: true
>>
>> bar: &bar
>> ~: ~
>>
>> Special properties should work, albeit being a little bit cludgy.
>
> With special properties, it might look like this:
>
> baz:
> $labels: [foo, bar]
> frob:
>

Yeah, that would work.

>>
>>> There appears to be no place to put metadata. The root of the tree is
>>> the top level of the YAML structure. There isn't any provision for
>>> having a top level object to hold things like the memreserve map. We
>>> may need a namespace to use for special properties that aren't nodes
>>> or properties.
>>>
>>
>> I donât think thatâs going to be necessary. Metadata are artifacts that
>> only have meaning when emitting a serialized form like dtb.
>>
>> yamldt fully supports /memreserve/ by simply treating a top level
>> /memreserve/ property as special and constructing the memreserve
>> entries at the emit phase.
>>
>> so
>>
>> /memreserve/ 1000 10;
>>
>> is simply:
>>
>> /memreserve/: [ 1000 10 ]
>>
>> Itâs a matter of definition what properties are encoded as metadata for the
>> serialized form.
>>
>>> The encoding differentiates between nodes and properties implicitly
>>> base on whether the contents are a map, or a scalar/list. This does
>>> mean any parser needs to do a bit more work and it may impact what can
>>> be done with validation (I'm not going to talk about validation in
>>> this email though. We'll talk next week.)
>>
>> Hmm, Iâm not sure I follow this. Is that related to the metadata problem.
>
> It has to do with iterating over properties or nodes. Nodes and props
> are mixed together, and any code processing them has to explicitly
> check the type to differentiate between the two. The alternative would
> be to collect all properties under a single key:
>
> parent:
> $props:
> foo: hello
> bar: world
> child1: {}
> child2: {}
>

Thatâs no problem. It does look weird though. I am using the same parser
for parsing bindings and this might make the files a little bit weird.

Keep in mind that we have two YAML formats to parse; the hardware description
and the bindings describing them.

>>
>> BTW, a blog post explaining the rationale behind yamldt is going to come
>> up soon at our website, Iâll post the link when it does.
>
> Cool. I see it got posted, but didnât get a chance to read it before I
> got on this airplane. Will try to read this evening.
>
> (BTW, Iâm arriving at about 11:00 tonight)
>

Have a safe flight, youâll get to beat me up tomorrow :)

> Cheers,
> g.

Regards

â Pantelis