Re: [PATCH v2] Makefile: Add clang-tidy and static analyzer support to makefile

From: Nick Desaulniers
Date: Mon Jul 13 2020 - 18:09:40 EST


(bumping a few points for v3)

On Thu, Jul 9, 2020 at 10:56 AM Nathan Huckleberry <nhuck@xxxxxxxxxx> wrote:
>
> On Wed, Jul 8, 2020 at 2:11 PM Nick Desaulniers <ndesaulniers@xxxxxxxxxx> wrote:
> >
> > On Wed, Jul 8, 2020 at 11:21 AM 'Nathan Huckleberry' via Clang Built
> > Linux <clang-built-linux@xxxxxxxxxxxxxxxx> wrote:
> > >
> > I think we should add scripts/clang-tools/ to MAINTAINERS under
> > CLANG/LLVM SUPPORT:
> > ```
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index c87b94e6b2f6..42602231929c 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -4211,6 +4211,7 @@ W: https://clangbuiltlinux.github.io/
> > B: https://github.com/ClangBuiltLinux/linux/issues
> > C: irc://chat.freenode.net/clangbuiltlinux
> > F: Documentation/kbuild/llvm.rst
> > +F: scripts/clang-tools/
> > K: \b(?i:clang|llvm)\b
> >
> > CLEANCACHE API
> > ```
> > that way we get cc'ed properly on proposed changes (should folks use
> > scripts/get_maintainer.pl).

bump

> > > --- /dev/null
> > > +++ b/scripts/clang-tools/run-clang-tools.py
> > > @@ -0,0 +1,77 @@
> > > +#!/usr/bin/env python
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +#
> > > +# Copyright (C) Google LLC, 2020
> > > +#
> > > +# Author: Nathan Huckleberry <nhuck@xxxxxxxxxx>
> > > +#
> > > +"""A helper routine run clang-tidy and the clang static-analyzer on
> > > +compile_commands.json."""
> > > +
> > > +import argparse
> > > +import json
> > > +import logging
> > > +import multiprocessing
> > > +import os
> > > +import re

Is `re` being used anywhere?

> > > +import subprocess
> > > +
> > > +def parse_arguments():
> > > + """Set up and parses command-line arguments.
> > > + Returns:
> > > + args: Dict of parsed args
> > > + Has keys 'file' and 'type'
> > > + """
> > > + usage = """Run clang-tidy or the clang static-analyzer on a
> > > + compilation database."""
> > > + parser = argparse.ArgumentParser(description=usage)
> > > +
> > > + type_help = ('Type of analysis to be performed')
> > > + parser.add_argument('type', choices=['clang-tidy', 'static-analyzer'],
> > > + help=type_help)
> > > + file_path_help = ('Path to the compilation database to parse')
> > > + parser.add_argument('file', type=str, help=file_path_help)
> >
> > I don't know if the kernel has a preferred style for Python, but I
> > think it would be good to be consistent in the use of single vs double
> > quotes for strings. My preference is for double quotes, but I don't
> > know enough about the various PEPs for style or if the kernel has a
> > preferred style for these.

double quotes.

> > > +
> > > + args = parser.parse_args()
> > > +
> > > + return args
> > > +
> > > +def init(l,t):
> > > + global lock
> > > + global analysis_type
> > > + lock = l
> > > + analysis_type = t
> >
> > Is this canonical Python? Maybe wrap these functions into methods of
> > an object you construct, that way you can assign these as instance
> > variables against `self`, rather than using global variables.
>
> I did this to allow shared locks between processes, see
> https://stackoverflow.com/questions/25557686/python-sharing-a-lock-between-processes

Ah, ok, I see the problem. In that case, I'm less worried about this.
`global` just sets off red flags unless there is a very good reason to
use it.

>
> >
> > > +
> > > +def run_analysis(entry):
> > > + filename = entry['file']
> > > + p = None
> > > + if(analysis_type == "clang-tidy"):
> > > + p = subprocess.run(["clang-tidy", "-p", os.getcwd(),
> > > + "-checks=-*,linuxkernel-*", filename],
> > > + stdout=subprocess.PIPE, stderr=subprocess.PIPE)
> > > + if(analysis_type == "static-analyzer"):
> > > + p = subprocess.run(["clang-tidy", "-p", os.getcwd(),
> > > + "-checks=-*,clang-analyzer-*", filename],

Is it worthwhile to NOT run `-*` passes and only run
`clang-analyzer-*`? Otherwise `make clang-analyzer` and `make
clang-tidy` contain a ton of duplicate info.

> > > + stdout=subprocess.PIPE, stderr=subprocess.PIPE)
> >
> > When you have a fair amount of duplication between two branches of an
> > if/else (for instance, same method invocation and number of
> > parameters, just slight differences in parameter values), consider if
> > you can use a ternary to simplify or make the code more concise. That
> > would also help avoid initializing `p` to `None`:
> >
> > checks = "-checks=-*,linuxkernel-*" if analysis_type == "clang-tidy"
> > else "-checks=-*,clang-analyzer-*"
> > p = subprocess.run(["clang-tidy", "-p", os.getcwd(), checks,
> > stdout=subprocess.PIPE, stderr=subprocess.PIPE]
> >
> > then maybe do some validation of the analysis_type when validating
> > command line arguments earlier.
>
> Argparse should already handle validation of the analysis type.

Cool, I still think the ternary can simplify a v3.

>
> >
> > > + lock.acquire()
> > > + print(entry['file'])
> > > + os.write(1, p.stdout)
> > > + os.write(2, p.stderr)
> >
> > Please use sys.stdout and sys.stderr rather than magic constants for
> > their file descriptors.

Also, I'm not a fan of how clang-tidy writes the errors to stdout.

$ make LLVM=1 -j71 defconfig clang-tidy 2> log.txt
write part of the log, and spews to stdout. Do you think it would
make sense to redirect stdout from clang-tidy to stderr for this
script?

$ grep error: log.txt | sort -u
$ grep clang-analyzer log.txt | sort -u
Checking some of the clang-tidy warnings, some seem harmless.
linux-next/net/core/devlink.c:9527:6: error: redefinition of
'devlink_compat_running_version' [clang-diagnostic-error]
looks legit, though not terribly important to fix ASAP. So that's cool.
The clang-analyzer report is a little beefier, once the traces start
getting long they become fairly hard to follow. Is it possible to dump
the html report of these? I guess the issue with that is that we
wouldn't be able to join them in the python script.

$ grep clang-analyzer log.txt | sort -u | cut -d '[' -f 2 | sort -u
clang-analyzer-core.CallAndMessage]
clang-analyzer-core.DivideZero]
clang-analyzer-core.NonNullParamChecker]
clang-analyzer-core.NullDereference]
clang-analyzer-core.StackAddressEscape]
clang-analyzer-core.UndefinedBinaryOperatorResult]
clang-analyzer-core.uninitialized.ArraySubscript]
clang-analyzer-core.uninitialized.Assign]
clang-analyzer-core.uninitialized.Branch]
clang-analyzer-core.uninitialized.UndefReturn]
clang-analyzer-deadcode.DeadStores]
clang-analyzer-optin.performance.Padding]
clang-analyzer-optin.portability.UnixAPI]
clang-analyzer-security.insecureAPI.bcmp]
clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
clang-analyzer-security.insecureAPI.strcpy]
clang-analyzer-unix.cstring.NullArg]
clang-analyzer-unix.Malloc]
clang-analyzer-valist.Uninitialized]

interesting. I like how clang-analyzer warns about bcmp and yet llvm
will generate calls to bcmp...
--
Thanks,
~Nick Desaulniers