Re: [PATCH v11 5/6] ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods

From: Vaibhav Jain
Date: Wed Jun 10 2020 - 08:10:30 EST


Dan Williams <dan.j.williams@xxxxxxxxx> writes:

> On Tue, Jun 9, 2020 at 10:54 AM Vaibhav Jain <vaibhav@xxxxxxxxxxxxx> wrote:
>>
>> Thanks Dan for the consideration and taking time to look into this.
>>
>> My responses below:
>>
>> Dan Williams <dan.j.williams@xxxxxxxxx> writes:
>>
>> > On Mon, Jun 8, 2020 at 5:16 PM kernel test robot <lkp@xxxxxxxxx> wrote:
>> >>
>> >> Hi Vaibhav,
>> >>
>> >> Thank you for the patch! Perhaps something to improve:
>> >>
>> >> [auto build test WARNING on powerpc/next]
>> >> [also build test WARNING on linus/master v5.7 next-20200605]
>> >> [cannot apply to linux-nvdimm/libnvdimm-for-next scottwood/next]
>> >> [if your patch is applied to the wrong git tree, please drop us a note to help
>> >> improve the system. BTW, we also suggest to use '--base' option to specify the
>> >> base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
>> >>
>> >> url: https://github.com/0day-ci/linux/commits/Vaibhav-Jain/powerpc-papr_scm-Add-support-for-reporting-nvdimm-health/20200607-211653
>> >> base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
>> >> config: powerpc-randconfig-r016-20200607 (attached as .config)
>> >> compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project e429cffd4f228f70c1d9df0e5d77c08590dd9766)
>> >> reproduce (this is a W=1 build):
>> >> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>> >> chmod +x ~/bin/make.cross
>> >> # install powerpc cross compiling tool for clang build
>> >> # apt-get install binutils-powerpc-linux-gnu
>> >> # save the attached .config to linux build tree
>> >> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc
>> >>
>> >> If you fix the issue, kindly add following tag as appropriate
>> >> Reported-by: kernel test robot <lkp@xxxxxxxxx>
>> >>
>> >> All warnings (new ones prefixed by >>, old ones prefixed by <<):
>> >>
>> >> In file included from <built-in>:1:
>> >> >> ./usr/include/asm/papr_pdsm.h:69:20: warning: field 'hdr' with variable sized type 'struct nd_cmd_pkg' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
>> >> struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */
>> >
>> > Hi Vaibhav,
>> >
>> [.]
>> > This looks like it's going to need another round to get this fixed. I
>> > don't think 'struct nd_pdsm_cmd_pkg' should embed a definition of
>> > 'struct nd_cmd_pkg'. An instance of 'struct nd_cmd_pkg' carries a
>> > payload that is the 'pdsm' specifics. As the code has it now it's
>> > defined as a superset of 'struct nd_cmd_pkg' and the compiler warning
>> > is pointing out a real 'struct' organization problem.
>> >
>> > Given the soak time needed in -next after the code is finalized this
>> > there's no time to do another round of updates and still make the v5.8
>> > merge window.
>>
>> Agreed that this looks bad, a solution will probably need some more
>> review cycles resulting in this series missing the merge window.
>>
>> I am investigating into the possible solutions for this reported issue
>> and made few observations:
>>
>> I see command pkg for Intel, Hpe, Msft and Hyperv families using a
>> similar layout of embedding nd_cmd_pkg at the head of the
>> command-pkg. struct nd_pdsm_cmd_pkg is following the same pattern.
>>
>> struct nd_pdsm_cmd_pkg {
>> struct nd_cmd_pkg hdr;
>> /* other members */
>> };
>>
>> struct ndn_pkg_msft {
>> struct nd_cmd_pkg gen;
>> /* other members */
>> };
>> struct nd_pkg_intel {
>> struct nd_cmd_pkg gen;
>> /* other members */
>> };
>> struct ndn_pkg_hpe1 {
>> struct nd_cmd_pkg gen;
>> /* other members */
[.]
>
> In those cases the other members are a union and there is no second
> variable length array. Perhaps that is why those definitions are not
> getting flagged? I'm not seeing anything in ndctl build options that
> would explicitly disable this warning, but I'm not sure if the ndctl
> build environment is missing this build warning by accident.

I tried building ndctl master with clang-10 with CC=clang and
CFLAGS="". Seeing the same warning messages reported for all command
package struct for existing command families.

./hpe1.h:334:20: warning: field 'gen' with variable sized type 'struct nd_cmd_pkg' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
struct nd_cmd_pkg gen;
^
./msft.h:59:20: warning: field 'gen' with variable sized type 'struct nd_cmd_pkg' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
struct nd_cmd_pkg gen;
^
./hyperv.h:34:20: warning: field 'gen' with variable sized type 'struct nd_cmd_pkg' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
struct nd_cmd_pkg gen;
^
>
> Those variable size payloads are also not being used in any code paths
> that would look at the size of the command payload, like the kernel
> ioctl() path. The payload validation code needs static sizes and the
> payload parsing code wants to cast the payload to a known type. I
> don't think you can use the same struct definition for both those
> cases which is why the ndctl parsing code uses the union layout, but
> the kernel command marshaling code does strict layering.
Even if I switch to union layout and replacing the flexible array 'payload'
at end to a fixed size array something like below, I still see
'-Wgnu-variable-sized-type-not-at-end' warning reported by clang:

union nd_pdsm_cmd_payload {
struct nd_papr_pdsm_health health;
__u8 buf[ND_PDSM_PAYLOAD_MAX_SIZE];
};

struct nd_pdsm_cmd_pkg {
struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */
__s32 cmd_status; /* Out: Sub-cmd status returned back */
__u16 reserved[2]; /* Ignored and to be used in future */
union nd_pdsm_cmd_payload payload;
} __attribute__((packed));


>
>> };
>>
>> Even though other command families implement similar command-package
>> layout they were not flagged (yet) as they are (I am guessing) serviced
>> in vendor acpi drivers rather than in kernel like in case of papr-scm
>> command family.
>
> I sincerely hope there are no vendor acpi kernel drivers outside of
> the upstream one.
Apologies if I was not clear. Was referring to nvdimm vendor uefi
drivers which ultimately service the DSM commands. Since CMD_CALL serves
as a conduit to send the command payload to these vendor drivers,
libnvdimm never needs to peek into the nd_cmd_pkg.payload
field. Consequently nfit module never hit this warning in kernel before.

>
>>
>> So, I think this issue is not just specific to papr-scm command family
>> introduced in this patch series but rather across all other command
>> families. Every other command family assumes 'struct nd_cmd_pkg_hdr' to
>> be embeddable and puts it at the beginning of their corresponding
>> command-packages. And its only a matter of time when someone tries
>> filtering/handling of vendor specific commands in nfit module when they
>> hit similar issue.
>>
>> Possible Solutions:
>>
>> * One way would be to redefine 'struct nd_cmd_pkg' to mark field
>> 'nd_payload[]' from a flexible array to zero sized array as
>> 'nd_payload[0]'.
>
> I just went through a round of removing the usage of buf[0] in ndctl
> since gcc10 now warns about that too.
>
>> This should make 'struct nd_cmd_pkg' embeddable and
>> clang shouldn't report 'gnu-variable-sized-type-not-at-end'
>> warning. Also I think this change shouldn't introduce any ABI change.
>>
>> * Another way to solve this issue might be to redefine 'struct
>> nd_pdsm_cmd_pkg' to below removing the 'struct nd_cmd_pkg' member. This
>> struct should immediately follow the 'struct nd_cmd_pkg' command package
>> when sent to libnvdimm:
>>
>> struct nd_pdsm_cmd_pkg {
>> __s32 cmd_status; /* Out: Sub-cmd status returned back */
>> __u16 reserved[2]; /* Ignored and to be used in future */
>> __u8 payload[];
>> };
>>
>> This should remove the flexible member nc_cmd_pkg.nd_payload from the
>> struct with just one remaining at the end. However this would make
>> accessing the [in|out|fw]_size members of 'struct nd_cmd_pkg'
>> difficult for the pdsm servicing functions.
>>
>>
>> Any other solution that you think, may solve this issue ?
>
> The union might help, but per the above I think only for parsing the
> command at which point I don't think the kernel needs a unified
> structure defining both the generic envelope and the end-point
> specific payload at once.

As I tested above, switching to union too will not solve the clang
warning.

Having a unified structure for a command family defining both
generic envelop and end-point specific payload, is what I see all the
existing command families doing.

However if I split 'struct nd_pdsm_cmd_pkg' to not have an embedded
'struct nd_cmd_pkg' then it goes opposite to what existing command family
implementations.

So to me it looks like no clear way to address this :-(

Another non-conventional way to fix this might be to suppress this clang
warning messages by adding "CFLAGS_papr_scm.o +=
-Wno-gnu-variable-sized-type-not-at-end" to papr_scm Makefile.

Current implementation 'struct nd_cmd_pkg' clearly depends on gcc
extension of having a flexible payload array which is allowed to be not
necessarily placed at the end of a containing struct. So the problem can be
attributed to difference in compiler implementations between GCC and
Clang rather than how 'struct nd_pdsm_cmd_pkg' and 'struct nd_cmd_pkg'
are laid out.

--
Cheers
~ Vaibhav