Re: Kernel 5.5.4 build fail for BPF-selftests with latest LLVM

From: Jesper Dangaard Brouer
Date: Thu Feb 20 2020 - 11:38:04 EST

On Wed, 19 Feb 2020 17:47:23 -0700
shuah <shuah@xxxxxxxxxx> wrote:

> On 2/19/20 5:27 PM, Alexei Starovoitov wrote:
> > On Wed, Feb 19, 2020 at 03:59:41PM -0600, Daniel DÃaz wrote:
> >>>
> >>> When I download a specific kernel release, how can I know what LLVM
> >>> git-hash or version I need (to use BPF-selftests)?
> >
> > as discussed we're going to add documentation-like file that will
> > list required commits in tools.
> > This will be enforced for future llvm/pahole commits.
> >
> >>> Do you think it is reasonable to require end-users to compile their own
> >>> bleeding edge version of LLVM, to use BPF-selftests?
> >
> > absolutely.
> + linux-kselftest@xxxxxxxxxxxxxxx
> End-users in this context are users and not necessarily developers.

I agree. And I worry that we are making it increasingly hard for
non-developer users.

> > If a developer wants to send a patch they must run all selftests and
> > all of them must pass in their environment.
> > "but I'm adding a tracing feature and don't care about networking tests
> > failing"... is not acceptable.
> This is a reasonable expectation when a developers sends bpf patches.

Sure. I have several versions on LLVM that I've compiled manually.

> >
> >>> I do hope that some end-users of BPF-selftests will be CI-systems.
> >>> That also implies that CI-system maintainers need to constantly do
> >>> "latest built from sources" of LLVM git-tree to keep up. Is that a
> >>> reasonable requirement when buying a CI-system in the cloud?
> >
> > "buying CI-system in the cloud" ?
> > If I could buy such system I would pay for it out of my own pocket to save
> > maintainer's and developer's time.

And Daniel DÃaz want to provide his help below (to tests it on arch
that you likely don't even have access to). That sounds like a good
offer, and you don't even have to pay.

> >
> >> We [1] are end users of kselftests and many other test suites [2]. We
> >> run all of our testing on every git-push on linux-stable-rc, mainline,
> >> and linux-next -- approximately 1 million tests per week. We have a
> >> dedicated engineering team looking after this CI infrastructure and
> >> test results, and as such, I can wholeheartedly echo Jesper's
> >> sentiment here: We would really like to help kernel maintainers and
> >> developers by automatically testing their code in real hardware, but
> >> the BPF kselftests are difficult to work with from a CI perspective.
> >> We have caught and reported [3] many [4] build [5] failures [6] in the
> >> past for libbpf/Perf, but building is just one of the pieces. We are
> >> unable to run the entire BPF kselftests because only a part of the
> >> code builds, so our testing is very limited there.
> >>
> >> We hope that this situation can be improved and that our and everyone
> >> else's automated testing can help you guys too. For this to work out,
> >> we need some help.
> >
> It would be helpful understand what "help" is in this context.
> > I don't understand what kind of help you need. Just install the
> > latest tools.

I admire that you want to push *everybody* forward to use the latest
LLVM, but saying latest is LLVM devel git tree HEAD is too extreme.
I can support saying latest LLVM release is required.

As soon as your LLVM patches are accepted into llvm-git-tree, you will
add some BPF selftests that util this. Then CI-systems pull latest
bpf-next they will start to fail to compile BPF-selftests, and CI
stops. Now you want to force CI-system maintainer to recompile LLVM
from git. This will likely take some time. Until that happens
CI-system doesn't catch stuff. E.g. I really want the ARM tests that
Linaro can run for us (which isn't run before you apply patches...).

> What would be helpful is to write bpf tests such that older tests that
> worked on older llvm versions continue to work and with some indication
> on which tests require new bleeding edge tools.
> > Both the latest llvm and the latest pahole are required.
> It would be helpful if you can elaborate why latest tools are a
> requirement.
> > If by 'help' you mean to tweak selftests to skip tests then it's a nack.
> > We have human driven CI. Every developer must run selftests/bpf before
> > emailing the patches. Myself and Daniel run them as well before applying.
> > These manual runs is the only thing that keeps bpf tree going.
> > If selftests get to skip tests humans will miss those errors.
> > When I don't see '0 SKIPPED, 0 FAILED' I go and investigate.
> > Anything but zero is a path to broken kernels.
> >
> > Imagine the tests would get skipped when pahole is too old.
> > That would mean all of the kernel features from year 2019
> > would get skipped. Is there a point of running such selftests?
> > I think the value is not just zero. The value is negative.
> > Such selftests that run old stuff would give false believe
> > that they do something meaningful.
> > "but CI can do build only tests"... If 'helping' such CI means hurting the
> > key developer/maintainer workflow such CI is on its own.
> >
> Skipping tests will be useless. I am with you on that. However,
> figuring out how to maintain some level of backward compatibility
> to run at least older tests and warn users to upgrade would be
> helpful.

What I propose is that a BPF-selftest that use a new LLVM feature,
should return FAIL (or perhaps SKIP), when it is compiled with say one
release old LLVM. This will allow new-tests to show up in CI-systems
reports as FAIL, and give everybody breathing room to upgrade their LLVM

> I suspect currently users are ignoring bpf failures because they
> are unable to keep up with the requirement to install newer tools
> to run the tests. This isn't great either.

Yes, my worry is also that we are simply making it too difficult for
non-developer users to run these tests. And I specifically want to
attract CI-systems to run these. And especially Linaro, who have
dedicated engineering team looking after their CI infrastructure, and
they explicitly in this email confirm my worry.

> Users that care are sharing their pain to see if they can get some
> help or explanation on why new tools are required every so often.
> I don't think everybody understands why. :)

Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat