Re: [PATCH 00/15] Implement MODVERSIONS for Rust
From: Petr Pavlu
Date: Mon Jul 22 2024 - 04:20:57 EST
On 7/15/24 22:39, Sami Tolvanen wrote:
> On Wed, Jul 10, 2024 at 7:30 AM Petr Pavlu <petr.pavlu@xxxxxxxx> wrote:
>> On 6/17/24 19:58, Sami Tolvanen wrote:
>>> The first 12 patches of this series add a small tool for computing
>>> symbol versions from DWARF, called gendwarfksyms. When passed a list
>>> of exported symbols, the tool generates an expanded type string
>>> for each symbol, and computes symbol CRCs similarly to genksyms.
>>> gendwarfksyms is written in C and uses libdw to process DWARF, mainly
>>> because of the existing support for C host tools that use elfutils
>>> (e.g., objtool).
>>
>> In addition to calculating CRCs of exported symbols, genksyms has other
>> features which I think are important.
>>
>> Firstly, the genksyms tool has a human-readable storage format for input
>> data used in the calculation of symbol CRCs. Setting the make variable
>> KBUILD_SYMTYPES enables dumping this data and storing it in *.symtypes
>> files.
>>
>> When a developer later modifies the kernel and wants to check if some
>> symbols have changed, they can take these files and feed them as
>> *.symref back to genksyms. This allows the tool to provide an actual
>> reason why some symbols have changed, instead of just printing that
>> their CRCs are different.
>>
>> Is there any plan to add the same functionality to gendwarfksyms, or do
>> you envison that people will use libabigail, Symbol-Type Graph, or
>> another tool for making this type of comparison?
>
> gendwarfksyms also uses human-readable input for the CRC calculations,
> and it prints out the input strings with the --debug option. I plan to
> hook this up to KBUILD_SYMTYPES in v2. It should be convenient enough
> to simply compare the pretty-printed output with diff, so I'm not sure
> if a built-in comparison option is needed. Any other DWARF analysis
> tool can be used to spot the differences too, as you mentioned.
>From my perspective, I'm okay if gendwarfksyms doesn't provide
functionality to compare a new object file with its reference symtypes
file.
As mentioned, genksyms has this functionality but I actually think the
way it works is not ideal. Its design is to operate on one compilation
unit at the time. This has the advantage that a comparison of each file
is performed in parallel during the build, simply because of the make
job system. On the other hand, it has two problems.
The first one is that genksyms doesn't provide a comparison of the
kernel as a whole. This means that the tool gives rather scattered and
duplicated output about changed structs in the build log. Ideally, one
would like to see a single compact report about what changed at the end
of the build.
The second problem is the handling of symtypes files. This data is large
and if one wants to store them in a Git repository together with the
kernel source, it is advisable to first compress/consolidate it in some
way. This is trivial because these files typically contain many
duplicates. However, the issue is that to feed the data back to
genksyms, they need to be unpacked during each build which can take some
time.
I think a better approach is to have a tool that can be given
a consolidated symtypes file as one input and can compare it with all
new symtypes files produced during a kernel build. An example of a tool
that takes this approach is the kabi Python script in UEK [1].
A few months ago, I also started working on a tool inspired by this
script. The goal is to have similar functionality but hopefully with
a much faster implementation. Hence, this tool is written in a compiled
language (Rust at the moment) and should also become multi-threaded. I'm
hoping to find some time to make progress on it and make the code
public. It could later be added to the upstream kernel to replace the
comparison functionality implemented by genksyms, if there is interest.
So as mentioned, I'm fine if gendwarfksyms doesn't have this
functionality. However, for distributions that rely on the symtypes
format, I'd be interested in having gendwarfksyms output its dump data
in this format as well.
For example, instead of producing:
gendwarfksyms: process_exported_symbols: _some_mangled_func_name (@ XYZ)
subprogram(
[formal parameters...]
)
-> structure_type core::result::Result<(), core::fmt::Error> {
[a description of the structure...]
};
.. the output could be something like this:
S#'core::result::Result<(), core::fmt::Error>' structure_type core::result::Result<(), core::fmt::Error> { [a description of the structure...] }
_some_mangled_func_name subprogram _some_mangled_func_name ( [formal parameters...] ) -> S#'core::result::Result<(), core::fmt::Error>'
>> Secondly, when distributions want to maintain stable kABI, they need to
>> be able to deal with patch backports that add new members to structures.
>> One common approach is to have placeholders in important structures
>> which can be later replaced by the new members as needed. __GENKSYMS__
>> ifdefs are then used at the C source level to hide these kABI-compatible
>> changes from genksyms.
>>
>> Gendwarfksyms works on the resulting binary and so using such ifdefs
>> wouldn't work. Instead, I suspect that what is required is a mechanism
>> to tell the tool that a given change is ok, probably by allowing to
>> specify some map from the original definition to the new one.
>>
>> Is there a plan to implement something like this, or how could it be
>> addressed?
>
> That's a great question. Here's what Android uses currently to
> maintain a stable kABI, I assume you're doing something similar?
Correct, (open)SUSE kernels have placeholders in likely-to-change
structs which can be used for new members. Or if no placeholder is
present, it might be necessary to place a new member in a gap (padding)
in the struct layout.
>
> https://android.googlesource.com/kernel/common/+/refs/heads/android15-6.6/include/linux/android_kabi.h
>
> If using unions here is acceptable to everyone, a simple solution
> would be to use a known name prefix for the reserved members and teach
> gendwarfksyms to only print out the original type for the replaced
> ones. For example:
>
> The initial placeholder:
>
> u8 __kabi_reserved_1[8];
>
> After replacement:
>
> union {
> u64 new_member;
> struct {
> u8 __kabi_reserved_1[8];
> };
> }
>
> Here gendwarfksyms would see the __kabi_reserved prefix and only use
> u8 [8] for the CRC calculation. Does this sound reasonable?
I like this idea. I think it's good that the necessary kABI information
about an updated member can be expressed at the source code level in
place of the actual change, and it isn't needed to feed additional input
to the tool.
[1] https://github.com/oracle/linux-uek/blob/dbdd7f3611cb03e607e156834497dd2767103530/uek-rpm/tools/kabi
Thanks,
Petr