RE: spdxcheck: python git module considered harmful (was RE: [PATCH] scripts/spdxcheck: Limit the scope of git.Repo)

From: Thomas Gleixner
Date: Wed Apr 09 2025 - 16:26:12 EST


Tim!

On Wed, Apr 09 2025 at 17:44, Tim Bird wrote:
>> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> On Tue, Apr 08 2025 at 17:34, Tim Bird wrote:
>> And yes, it ignores not yet tracked files, but if you want to check
>> them, then it's easy enough to commit them temporarily or provide a
>> dedicated file target to the tools, which ignores git.
>
> OK. Yes. That's an easy workaround.

Actually spdxcheck supports that already:

scripts/spdxcheck.py path/to/file

>> Good luck for coming up with a clever and clean solution for that!
>
> I thought about various solutions for this, but each one I came up
> with had other drawbacks. If it was just a matter of separating
> *.[chS] files from ELF object files, that would be easy to deal with.
> But we put SPDX headers on all kinds of files, and there are lots
> of other types of files generated during a build that are not just
> ELF objects. And build rules change over time. So even if I made
> a comprehensive system today to catch build-generated outliers,
> the solution would probably need constant updating and tweaking, which
> IMHO makes it a no-go.

I'm glad that I'm not the only one who came to this conclusion :)

>> Just for the record: I rather wish that people would contribute to
>> eliminate the remaining 17% (15397 files) which do not have SPDX
>> identifiers than complaining about the trivial to solve short-comings of
>> the tool, which was written to help this effort and to make sure that it
>> does not degrade.
>
> I agree with this. Analyzing where the headers are missing is interesting.
> But it's more important to just fix the missing ones.
> I'll spend more of my time working on missing headers,
> rather than on tools to analyze and report them.

Very appreciated.

Thanks,

tglx