Re: [PATCH 0/6] Address issues with SPDX requirements and PEP-263

From: Markus Heiser
Date: Sat Sep 07 2019 - 14:37:49 EST


Am 07.09.19 um 20:04 schrieb Mauro Carvalho Chehab:
Em Sat, 7 Sep 2019 19:33:06 +0200
Markus Heiser <markus.heiser@xxxxxxxxxxx> escreveu:
An (uncatched) exception is thrown, when writing UTF-8 to a stream which
do not support UTF-8 .. this is not a crash, it mostly indicates that the
developper makes some wrong assumption about the use-case.

A not-handled exception is a crash in Python. I've seen python scripts
crash countless times with non-English names.

This has nothing to do with the language, ask the developer of those scripts.

There exists
also the possibility to encode the UTF-8 to ASCII and replace unknown
code points in the out-stream, or to catch the exception.

Yeah, but getting this right is very painful. I use patchwork since 2013.
It took *years* for it to not crash with non-ASCII chars[1]. That's, btw,
the primary reason why I don't usually use python: with other languages,
an alien char doesn't cause a crash.

Python cares encoded (text) string-types while other languages and
application are just piping bytes to streams .. if you care about the
enconding you need exceptions when one whants write UTF-8 to ASCII out.

Anyway this is a bit of nitpicking / not helping here ..


[1] I might be wrong, but the last patch I saw addressing an issue
there was applied this year.

I alrady postet an example [1]

<snip>
This means your application has to know the encoding of a stream/file.
E.g. we handle the output from of the external Perl script
scripts/kernel-docs by encoding the byte stream from proc-call's
stdout into utf-8:

out, err = codecs.decode(out, 'utf-8'), codecs.decode(err, 'utf-8')

see patch https://github.com/torvalds/linux/commit/86c0f046a8b0c23fca65f77333c233a06c25ef9a

Again, this is talking about application development and has
nothing to do with the encoding of the source files.
<snap>

[1] https://www.mail-archive.com/linux-doc@xxxxxxxxxxxxxxx/msg33240.html


But this was only academical, where do we have such problems in practice?

At least on media, we define that some Kernel strings can be UTF-8.
See, for example the model field at the media_entity struct:

https://linuxtv.org/downloads/v4l-dvb-apis/kapi/mc-core.html

As stated there:

"media_entity.model must be filled with the device model name as
a NUL-terminated UTF-8 string. The device/model revision must
not be stored in this field."

I've no idea if the two perf scripts that contain the encoding data are
meant to print some strings that may be UTF-8 encoding (like those that
we have at the media subsystem), or if it is just that whomever added
were using e-macs and wanted to make his life simpler. As it is better
to be safe then sorry, on patches 2 and 3, I'm assuming the first case.

Hm, I'am unsure if I understand you correct: Using UTF-8 in the .rst
files are fine .. where do we have scripts generating UTF-8 outputs?
(except the HTML output).

In thesis, perf scripts may be reading strings from the Kernel, with
might be using UTF-8 encoding.



In any case, we do need the encoding line at Sphinx extensions,
although there, the shebang line is optional.

In other words, we have those alternatives:

1) Neither shebang nor coding -> SPDX will be at first line;
2) shebang + SPDX -> SPDX will be at the second line;
3) shebang + coding + SPDX -> SPDX will be at the third line;
4) coding + SPDX

This is something that only makes sense for Sphinx extensions.

IMHO, I would place SPDX at the second line too, but I *guess* Python
may accept it at the first line and would still properly evaluate
coding (as this technically satisfies the text at PEP-263).

Why you are so restrictive ..

No idea. I would actually prefer to just remove the restriction, and let
the SPDX header to be anywhere inside the first comment block inside a
file [2].

That's basically how this thread started: other developers think
that it is a good idea to be pedantic. So, be it, but let's then fix
the documentation, as the way it is, it is implicitly forbidding the
addition of encoding lines for Python scripts.

[2] I *suspect* that the restriction was added in order to make
./scripts/spdxcheck.py to run faster and to avoid false positives.
Right now, if the maximum limit is removed (or set to a very high
value), there will be one false positive:

Documentation/dev-tools/kselftest.rst

This doc has a SPDX-like tag at line 230, asking people to add SPDX
headers on files, but the file itself doesn't have its own SPDX tag.

what we normal do:

- write a shebang line if this file is called directly from the
command line .. but we do not need shebangs on py modules which
are imported from other modules or scripts

- write a encoding line if it is need or helpful / mostly it is helpful
to know the encoding of a text/code file.

- add a SPDX tag

Yes, but this violates the current documentation, as it doesn't allow the
SPDX tag after line #2.

Thats what I mean: The documentation was written with only a small use-cases
in mind .. there is no real need for SPDX to be in line one or two ... lets
fix the documentation as I described before.

Side note: if I can help you with perf or your build systems, don't hesitate
to contact me directly.

-- Markus --

At the end we will have files with one, two or all three of this lines.
And the oder of this lines is, what I wrote:


Thats what I mean [1] .. lets patch the description in the license-rules.rst::

- first line for the OS (shebang)
- second line for environment (python-encoding, editor-mode, ...)
- third and more lines for application (SPDX use) ..

[1] https://www.mail-archive.com/linux-doc@xxxxxxxxxxxxxxx/msg33240.html

-- Markus --
This suggests to me that we're adding a bunch of complications that we
don't necessarily need. What am I missing here?

Educate me properly and I'll not try to stand in the way of all this...


It seems like it is not only me who is mising something .. what are
the use-cases we have py-Exceptions, what are the use-cases to be so
restrictive as you described above.

.. or did alice get lost in the cave?

Thanks for your patience with me

-- Markus --



Thanks,
Mauro