Re: [PATCH] get_maintainer: Extend matched name characters in maintainers_in_file()

From: Joe Perches
Date: Sun Sep 18 2022 - 17:40:51 EST


On Sun, 2022-09-18 at 22:32 +0200, Janne Grunau wrote:
> On 2022-09-18 10:03:17 -0700, Joe Perches wrote:
> > On Sat, 2022-09-17 at 07:11 -0700, Joe Perches wrote:
> > > On Fri, 2022-09-16 at 10:47 +0200, Janne Grunau wrote:
> > > > Extend the regexp matching name characters to cover Unicode blocks Latin
> > > > Extended-A and Extended-B.
> > > > Fixes 'scripts/get_maintainer.pl -f' for
> > > > 'Documentation/devicetree/bindings/clock/apple,nco.yaml'.
[]
> > > > diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
> > > []
> > > > @@ -442,7 +442,7 @@ sub maintainers_in_file {
> > > > my $text = do { local($/) ; <$f> };
> > > > close($f);
> > > >
> > > > - my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
> > > > + my @poss_addr = $text =~ m$[A-Za-zÀ-ɏ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
> > >
> > > my @poss_addr = $text =~ m$[\p{XPosixAlpha}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
> >
> > Using variations of \p{posix} doesn't seem to work for at least perl 5.34.
> >
> > \p{print} seems to work for Documentation/devicetree/bindings/clock/apple,nco.yaml,
> > but I don't know how fragile it is.
> >
> > \p{print} might be too greedy...
>
> It is, it produces following diff (checking all files in
> Documentation/devicetree/bindings):
> -Lubomir Rintel <lkundrak@xxxxx> (in file)
> +"Copyright 2019,2020 Lubomir Rintel" <lkundrak@xxxxx> (in file)
>
> There are multiple hits of this form. The main issue is that \p{print}
> includes space. That however fixes many names with 3 parts.

right

> > diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
[]
> > @@ -2456,11 +2456,12 @@ sub clean_file_emails {
> > foreach my $email (@file_emails) {
> > $email =~ s/[\(\<\{]{0,1}([A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+)[\)\>\}]{0,1}/\<$1\>/g;
> > my ($name, $address) = parse_email($email);
> > + $name =~ s/^\p{space}*\p{punct}*\p{space}*//;
>
> This change is useful independently of the name regexp as it rejects
> '- <email@xxxxxxxx>' (yaml list items) as valid name, email combination.

Good. The below might be a bit better too:

$name =~ s/(?:\p{space}|\p{punct})*//;