RE: Kernel 5.5.4 build fail for BPF-selftests with latest LLVM

From: Bird, Tim
Date: Thu Feb 20 2020 - 12:02:38 EST




> -----Original Message-----
> From: Jesper Dangaard Brouer
>
> On Wed, 19 Feb 2020 17:47:23 -0700
> shuah <shuah@xxxxxxxxxx> wrote:
>
> > On 2/19/20 5:27 PM, Alexei Starovoitov wrote:
> > > On Wed, Feb 19, 2020 at 03:59:41PM -0600, Daniel DÃaz wrote:
> > >>>
> > >>> When I download a specific kernel release, how can I know what LLVM
> > >>> git-hash or version I need (to use BPF-selftests)?
> > >
> > > as discussed we're going to add documentation-like file that will
> > > list required commits in tools.
> > > This will be enforced for future llvm/pahole commits.
> > >
> > >>> Do you think it is reasonable to require end-users to compile their own
> > >>> bleeding edge version of LLVM, to use BPF-selftests?
> > >
> > > absolutely.

Is it just the BPF-selftests that require the bleeding edge version of LLVM,
or do BPF features themselves need the latest LLVM. If the latter, then this
is quite worrisome, and I fear the BPF developers are getting ahead of themselves.
We don't usually have a kernel dependency on the latest compiler version (some
recent security fixes are an anomaly). In fact deprecating support for older compiler
versions has been quite slow and methodical over the years.

It's quite dangerous to be baking stuff into the kernel that depends on features
from compilers that haven't even made it to release yet.

I'm sorry, but I'm coming into the middle of this thread. Can you please explain
what the features are in the latest LLVM that are required for BPF-selftests?
-- Tim

> > + linux-kselftest@xxxxxxxxxxxxxxx
> >
> > End-users in this context are users and not necessarily developers.
>
> I agree. And I worry that we are making it increasingly hard for
> non-developer users.
>
>
> > > If a developer wants to send a patch they must run all selftests and
> > > all of them must pass in their environment.
> > > "but I'm adding a tracing feature and don't care about networking tests
> > > failing"... is not acceptable.
> >
> > This is a reasonable expectation when a developers sends bpf patches.
>
> Sure. I have several versions on LLVM that I've compiled manually.
>
> > >
> > >>> I do hope that some end-users of BPF-selftests will be CI-systems.
> > >>> That also implies that CI-system maintainers need to constantly do
> > >>> "latest built from sources" of LLVM git-tree to keep up. Is that a
> > >>> reasonable requirement when buying a CI-system in the cloud?
> > >
> > > "buying CI-system in the cloud" ?
> > > If I could buy such system I would pay for it out of my own pocket to save
> > > maintainer's and developer's time.
>
> And Daniel DÃaz want to provide his help below (to tests it on arch
> that you likely don't even have access to). That sounds like a good
> offer, and you don't even have to pay.
>
> > >
> > >> We [1] are end users of kselftests and many other test suites [2]. We
> > >> run all of our testing on every git-push on linux-stable-rc, mainline,
> > >> and linux-next -- approximately 1 million tests per week. We have a
> > >> dedicated engineering team looking after this CI infrastructure and
> > >> test results, and as such, I can wholeheartedly echo Jesper's
> > >> sentiment here: We would really like to help kernel maintainers and
> > >> developers by automatically testing their code in real hardware, but
> > >> the BPF kselftests are difficult to work with from a CI perspective.
> > >> We have caught and reported [3] many [4] build [5] failures [6] in the
> > >> past for libbpf/Perf, but building is just one of the pieces. We are
> > >> unable to run the entire BPF kselftests because only a part of the
> > >> code builds, so our testing is very limited there.
> > >>
> > >> We hope that this situation can be improved and that our and everyone
> > >> else's automated testing can help you guys too. For this to work out,
> > >> we need some help.
> > >
> >
> > It would be helpful understand what "help" is in this context.
> >
> > > I don't understand what kind of help you need. Just install the
> > > latest tools.
>
> I admire that you want to push *everybody* forward to use the latest
> LLVM, but saying latest is LLVM devel git tree HEAD is too extreme.
> I can support saying latest LLVM release is required.
>
> As soon as your LLVM patches are accepted into llvm-git-tree, you will
> add some BPF selftests that util this. Then CI-systems pull latest
> bpf-next they will start to fail to compile BPF-selftests, and CI
> stops. Now you want to force CI-system maintainer to recompile LLVM
> from git. This will likely take some time. Until that happens
> CI-system doesn't catch stuff. E.g. I really want the ARM tests that
> Linaro can run for us (which isn't run before you apply patches...).
>
>
> > What would be helpful is to write bpf tests such that older tests that
> > worked on older llvm versions continue to work and with some indication
> > on which tests require new bleeding edge tools.
> >
> > > Both the latest llvm and the latest pahole are required.
> >
> > It would be helpful if you can elaborate why latest tools are a
> > requirement.
> >
> > > If by 'help' you mean to tweak selftests to skip tests then it's a nack.
> > > We have human driven CI. Every developer must run selftests/bpf before
> > > emailing the patches. Myself and Daniel run them as well before applying.
> > > These manual runs is the only thing that keeps bpf tree going.
> > > If selftests get to skip tests humans will miss those errors.
> > > When I don't see '0 SKIPPED, 0 FAILED' I go and investigate.
> > > Anything but zero is a path to broken kernels.
> > >
> > > Imagine the tests would get skipped when pahole is too old.
> > > That would mean all of the kernel features from year 2019
> > > would get skipped. Is there a point of running such selftests?
> > > I think the value is not just zero. The value is negative.
> > > Such selftests that run old stuff would give false believe
> > > that they do something meaningful.
> > > "but CI can do build only tests"... If 'helping' such CI means hurting the
> > > key developer/maintainer workflow such CI is on its own.
> > >
> >
> > Skipping tests will be useless. I am with you on that. However,
> > figuring out how to maintain some level of backward compatibility
> > to run at least older tests and warn users to upgrade would be
> > helpful.
>
> What I propose is that a BPF-selftest that use a new LLVM feature,
> should return FAIL (or perhaps SKIP), when it is compiled with say one
> release old LLVM. This will allow new-tests to show up in CI-systems
> reports as FAIL, and give everybody breathing room to upgrade their LLVM
> compiler.
>
> > I suspect currently users are ignoring bpf failures because they
> > are unable to keep up with the requirement to install newer tools
> > to run the tests. This isn't great either.
>
> Yes, my worry is also that we are simply making it too difficult for
> non-developer users to run these tests. And I specifically want to
> attract CI-systems to run these. And especially Linaro, who have
> dedicated engineering team looking after their CI infrastructure, and
> they explicitly in this email confirm my worry.
>
>
> > Users that care are sharing their pain to see if they can get some
> > help or explanation on why new tools are required every so often.
> > I don't think everybody understands why. :)
>
> --
> Best regards,
> Jesper Dangaard Brouer
> MSc.CS, Principal Kernel Engineer at Red Hat
> LinkedIn: http://www.linkedin.com/in/brouer