Re: [PATCH] cxl/acpi: Verify CHBS length for CXL2.0
From: Zhijian Li (Fujitsu)
Date: Fri Mar 28 2025 - 00:16:56 EST
On 27/03/2025 21:36, Dan Williams wrote:
> Zhijian Li (Fujitsu) wrote:
>>
>>
>> On 27/03/2025 11:44, Ira Weiny wrote:
>>> Li Zhijian wrote:
>>>> Per CXL Spec r3.1 Table 9-21, both CXL1.1 and CXL2.0 have defined their
>>>> own length, verify it to avoid an invalid CHBS
>>>
>>>
>>> I think this looks fine. But did a platform have issues with this?
>>
>> Not really, actually, I discovered it while reviewing the code and
>> CXL specification.
>>
>> Currently, this issue arises only when I inject an incorrect length
>> via QEMU environment. Our hardware does not experience this problem.
>>
>>
>>> Does this need to be backported?
>> I remain neutral :)
>
> What does the kernel do with this invalid CHBS from QEMU? I would be
> happy to let whatever bad effect from injecting a corrupted CHBS just
> happen because there are plenty of ways for QEMU to confuse the kernel
> even if the table lengths are correct.
>
> Unless it has real impact I would rather not touch the kernel for every
> possible way that QEMU can make a mistake.
Thank you for the feedback.
If your earlier comments were specifically about ***backporting*** this patch,
I agree there might not be an urgent need for that.
However, regarding the discussion on whether this patch should be accepted
upstream, TBH, I believe it is necessary.
1. The **CXL Specification (r3.1, Table 9-21)** explicitly defines `length`
requirements for CHBS in both CXL 1.1 and CXL 2.0 cases. Failing to
validate this field against the spec risks misinterpretation of invalid
configurations.
2. As mentioned in section **2.13.8** of the *CXL Memory Device Software Guide (Rev 1.0)*,
It's recommended to verify the CHBS length.
While the immediate impact might be limited to edge cases (e.g., incorrect QEMU configurations),
upstreaming this aligns the kernel with spec-mandated checks and improves
robustness for future use cases.
[1] https://cdrdv2-public.intel.com/643805/643805_CXL_Memory_Device_SW_Guide_Rev1_1.pdf
>
> I.e. if it was a widespread problem that affected multiple QEMU users by
> default then maybe. Just your local test gone awry? Maybe not.