Re: [RFC] cxl_test: upgrade as a first class citizen selftests capable driver

From: Luis Chamberlain
Date: Wed Jan 04 2023 - 14:49:33 EST


On Tue, Dec 20, 2022 at 11:51:41AM -0800, Dan Williams wrote:
> Luis Chamberlain wrote:
> > On Mon, Dec 19, 2022 at 04:27:10PM -0800, Dan Williams wrote:
> > > Luis Chamberlain wrote:
> > > > On Fri, Dec 16, 2022 at 08:55:19PM -0800, Dan Williams wrote:
> > > > > In other words the suggestion that the current
> > > > > organization ultimately leads to bit rot has not been substantiated in
> > > > > practice.
> > > >
> > > > On top of this patch I just added a custom debug patch to my tree which
> > > > enables CXL_BUS and CXL_TEST by default when this is currently allowed
> > > > and it got quite a bit of kernel build warnings. Although some of these
> > > > are specific to my change, some of them do not seem to be related to
> > > > that and likely could benefit from fixing:
> > > >
> > > > https://gist.github.com/mcgrof/73dce72939590c6edc9413b0384ae4c2
> > > >
> > > > And so although you may not see some build warnings so far, it does not
> > > > negate my suggestion that having cxl_test as a proper upstream driver strategy
> > > > gets you more build testing / coverage.
> > >
> > > If autobuild coverage of test components is the main concern then
> > > cxl_test can copy what nfit_test is doing with CONFIG_NVDIMM_TEST_BUILD.
> > > No need for disruptive redesign of how this facility is integrated.
> >
> > I've itemized a list of gains of having this properly integrated. What
> > gains are there of this being an external module other than a few folks
> > are used to it and it been done before for other subsystems?
>
> Your crash report is a prime example of why this needs to stay an
> external module.

The crash *can* be avoided completely *iff* the semantics over the
requirements are expressed clearly through kconfig. My follow up patch
to the top level Makefile INSTALL_MOD_DIR to use "updates" instead of
"extra" essentially exposed anyone other than folks using a specific
version of RHEL or Fedora *easily* can end up crashing with cxl_test.
That's I think a far much worse predicament.

> Any redefinition of what a symbol does via --wrap= is a
> fragile proposition.

It is what cxl_test does though. Supporting it as a module Vs built-in
has no difference except as exposing semantics and requirements clearly.

> The fact that crash signatures with cxl_test loaded
> have the external module taint flag set is a feature.

I helped review the patch that added the taint flag for all testing
modules, that it does not mean we can't add it for built-in. This can
easily be done for instance with a kconfig symbol which pegs the taint
for any test module as built-in. In fact if we're not tainint built-in
test modules that change should happen anyway.

Having a test module be built-in or not shoud in no way shape or form
affect your testing. If the driver *happens* to rely on module load
and unload to any clean state machine -- that should be fixed given
the slew of bugs I have found other test modules which follow similar
logic.

> The --wrap= option
> has no business within the main tree

If --wrap is really unreliable, the unreliable aspects should be
documented, however it seems in this case it is just seems for cxl
its only used for testing. And whether or not you test with built-in
or modules should have no effect over --wrap.

> because it violates the valid
> assumptions of other cxl_test-innocent developers.

By having cxl_test be a proper upstream driver you define the
requirements clearly, and the kconfig symbols enabled are sufficient
to only let that module be built if folks are ready to shoot themseves
on the foot. Today the semantics are not clear, and in fact relies on
a old distro INSTALL_MOD_DIR assumption.

> The benefit that resonated with me during this discussion was more
> compile test coverage for cxl_test components.

Those benefits still stand.

> However, that is achieved
> by tools/testing/cxl/ adopting the same compile coverage scheme that
> tools/testing/nvdimm/ has with CONFIG_NVDIMM_TEST_BUILD.

That does not by any means mean that CONFIG_NVDIMM_TEST_BUILD can also
be converted to allow built-in. In fact I'd argue the same for that too.
The same benefits applies there too.

Luis