Re: [PATCH v3 0/3] x86/cacheinfo: Set the number of leaves per CPU

From: Andreas Herrmann
Date: Fri Sep 01 2023 - 03:51:45 EST


On Fri, Sep 01, 2023 at 08:50:31AM +0200, Andreas Herrmann wrote:
> On Fri, Aug 04, 2023 at 06:24:18PM -0700, Ricardo Neri wrote:
> > Hi,
>
> Hi Ricardo,
>
> > This is v3 of a patchset to set the number of cache leaves independently
> > for each CPU. v1 and v2 can be found here [1] and here [2].
>
> I am on CC of your patch set and glanced through it.
> Long ago I've touched related code but now I am not really up-to-date
> to do a qualified review in this area. First, I would have to look
> into documentation to refresh my memory etc. pp.
>
> I've not seen (or it escaped me) information that this was tested on a
> variety of machines that might be affected by this change. And there
> are no Tested-by-tags.
>
> Even if changes look simple and reasonable they can cause issues.
>
> Thus from my POV it would be good to have some information what tests
> were done. I am not asking to test on all possible systems but just
> knowing which system(s) was (were) used for functional testing is of
> value.

Doing a good review -- trying to catch every flaw -- is really hard to
do. Especially when you are not actively doing development work in an
area.

For example see

commit e33b93650fc5 ("blk-iocost: Pass gendisk to ioc_refresh_params")
[Breno Leitao <leitao@xxxxxxxxxx>, Tue Feb 28 03:16:54 2023 -0800]

This fixes commit

ce57b558604e ("blk-rq-qos: make rq_qos_add and rq_qos_del more
useful") [Christoph Hellwig <hch@xxxxxx>, Fri Feb 3 16:03:54 2023
+0100]

I had reviewed the latter (see
https://marc.info/?i=Y8plg6OAa4lrnyZZ@suselix) and the entire patch
series. I've compared the original code with the patch and walked
through every single hunk of the diff and tried to think it
through. The changes made sense to me. Then came the bug report(s) and
I felt that I had failed tremendously. To err is human.

What this shows (and it is already known): with every patch new errors
are potentially introduced in the kernel. Functional, and higher
level testing can help to spot them before a kernel is deployed in the
field.

At a higher level view this proves another thing.
Linux kernel development is a transparent example of
"peer-review-process".

In our scientific age it is often postulated that peer review is the
way to go[1] and that it kind of guarantees that what's published, or
rolled out, is reasonable and "works".

The above sample shows that this process will not catch all flaws and
that proper, transparent and reliable tests are required before
anything is deployed in the field.

This is true for every branch of science.

If it is purposely not done something is fishy.


[1] Also some state that it is "the only way to go" and every thing
figured out without a peer-review-process is false and can't be
trusted. Of course this is a false statement.

--
Regards,
Andreas

SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nürnberg, Germany
GF: Ivo Totev, Andrew McDonald, Werner Knoblich
(HRB 36809, AG Nürnberg)