Re: [PATCH] docs: license-rules.txt: cover SPDX headers on Python scripts
From: Mauro Carvalho Chehab
Date: Thu Sep 05 2019 - 08:07:11 EST
Em Thu, 5 Sep 2019 11:27:03 +0200
Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> escreveu:
> On Thu, Sep 05, 2019 at 06:23:13AM -0300, Mauro Carvalho Chehab wrote:
> > The author of the license-rules.rst file wanted to be very restrict
> > with regards to the location of the SPDX header. It says that
> > the SPDX header "shall be added at the first possible line in
> > a file which can contain a comment". Not happy with this already
> > restrictive requiement, it goes further:
> >
> > "For the majority of files this is the first line, except for
> > scripts", opening an exception to have the SPDX header at the
> > second line, if the first line starts with "#!".
> >
> > Well, it turns that this is too restrictive for Python scripts,
> > and may cause regressions if this would be enforced.
> >
> > As mentioned on:
> > https://stackoverflow.com/questions/728891/correct-way-to-define-python-source-code-encoding
> >
> > Python's PEP-263 [1] dictates that an script that needs to default to
> > UTF-8 encoding has to follow this rule:
> >
> > 'Python will default to ASCII as standard encoding if no other
> > encoding hints are given.
> >
> > To define a source code encoding, a magic comment must be placed
> > into the source files either as first or second line in the file'
> >
> > And:
> > 'More precisely, the first or second line must match the following
> > regular expression:
> >
> > ^[ \t\f]*#.*?coding[:=][ \t]*([-_.a-zA-Z0-9]+)'
> >
> > [1] https://www.python.org/dev/peps/pep-0263/
> >
> > If a script has both "#!" and the charset encoding line, we can't place
> > a SPDX tag without either violating license-rules.rst or breaking the
> > script by making it crash with non-ASCII characters.
> >
> > So, add a sort notice saying that, for Python scripts, the SPDX
> > header may be up to the third line, in order to cover the case
> > where both "#!" and "# .*coding.*UTF-8" lines are found.
> >
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@xxxxxxxxxx>
> > ---
> > Documentation/process/license-rules.rst | 7 +++++--
> > 1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/process/license-rules.rst b/Documentation/process/license-rules.rst
> > index 2ef44ada3f11..5d23e3498b1c 100644
> > --- a/Documentation/process/license-rules.rst
> > +++ b/Documentation/process/license-rules.rst
> > @@ -64,9 +64,12 @@ License identifier syntax
> > possible line in a file which can contain a comment. For the majority
> > of files this is the first line, except for scripts which require the
> > '#!PATH_TO_INTERPRETER' in the first line. For those scripts the SPDX
> > - identifier goes into the second line.
> > + identifier goes into the second line\ [1]_.
> >
> > -|
> > +.. [1] Please notice that Python scripts may also need an encoding rule
> > + as defined on PEP-263, which should be defined either at the first
> > + or the second line. So, for such scripts, the SPDX identifier may
> > + go up to the third line.
> >
> > 2. Style:
> >
>
> If you are going to do this, can you also fix up scripts/spdxcheck.py to
> properly catch this
For completeness, just added a check for it, and a "stats" mode at the
script that will tell on what line the first SPDX tag occurs.
I'll probably rework at the patch later, in order to disable the pedantic
mode by default.
There are currently 227 files that don't complain with the "up to line 3"
rule, including COPYING (with should probably be excluded from the check).
Patches are at:
https://git.linuxtv.org/mchehab/experimental.git/log/?h=spdx_pedantic
Btw, most violations are due to:
/*
* SPDX...
Regards,
Mauro